Occupational Stress Detection
Early detection of occupational stress using machine learning and large language models with ~90% accuracy
Overview
A comprehensive ML + LLM pipeline for occupational stress detection achieving ~90% accuracy. Our novel hybrid feature selection approach outperforms 8 prior studies on a national stress dataset, with real-time explainable predictions in <100ms latency for safety-critical workplace environments.
π Published in: PLoS ONE (Impact Factor: 2.9, Q1 Journal)
π Workshop: Accepted at NeurIPS 2025 Women in Machine Learning Workshop
π Paper: PLoS ONE
π» Code: GitHub Repository
Problem Statement
Occupational stress is a critical public health issue:
- Workplace accidents: Stress contributes to 60-80% of workplace accidents
- Economic cost: $300B annually in US from absenteeism and reduced productivity
- Mental health: Leading cause of burnout, anxiety, and depression
- Early detection gap: Current surveys are subjective and infrequent
Goal: Develop an automated, objective system for early stress detection to enable timely workplace interventions.
Dataset
Source: Bangladesh National Occupational Stress Survey (2022)
Size: 2,847 participants from 15 industries
Features: 67 survey items covering:
- Work environment factors
- Job demands and control
- Social support systems
- Individual coping mechanisms
- Demographic information
Target: Binary classification (Stressed vs. Non-Stressed) based on validated PSS-10 scale
Solution Architecture
Phase 1: Data Preprocessing
Challenge: High-dimensional survey data (67 features) with multicollinearity
Pipeline:
- Binarize target: Convert continuous stress scores to binary labels (PSS-10 β₯ 20)
- Normalization: Min-Max scaling for numerical features
- Train-Test Split: 80-20 stratified split
- Outlier Handling: IQR-based removal (removed 3.2% extreme outliers)
Phase 2: Novel Hybrid Feature Selection
Innovation: Combined statistical and model-based feature selection for robustness
Two-Stage Process:
Stage 1: Statistical Feature Elimination
- RFECV (Recursive Feature Elimination with CV): Selected top 28 features via cross-validation
- ANOVA F-test: Identified 28 features with highest statistical significance
- Drop Zero Variance: Removed constant features
- Drop High Correlation: Removed features with r > 0.85
Stage 2: Combine Best Features
- Union: Combined 39 unique features from both methods
- Validation: Cross-validated on hold-out set
Result: Reduced dimensionality from 67 β 39 features while retaining 98% of information
Phase 3: Machine Learning Models
Classical ML Algorithms Tested (10 models):
- Random Forest β Best Overall
- Accuracy: 89.7%
- F1-Score: 0.891
- AUC-ROC: 0.942
- AdaBoost
- Accuracy: 88.2%
- F1-Score: 0.879
- Decision Tree
- Accuracy: 86.5%
- F1-Score: 0.862
- Logistic Regression
- Accuracy: 85.1%
- F1-Score: 0.847
- Support Vector Machine (SVC)
- Accuracy: 87.3%
- F1-Score: 0.869
- K-Nearest Neighbors
- Accuracy: 84.8%
- F1-Score: 0.843
- Gaussian Naive Bayes
- Accuracy: 82.9%
- F1-Score: 0.825
- XGBoost
- Accuracy: 88.9%
- F1-Score: 0.885
- LightGBM
- Accuracy: 88.4%
- F1-Score: 0.881
- CatBoost
- Accuracy: 87.6%
- F1-Score: 0.873
Ensemble Model:
- Soft Voting: Combined Random Forest, XGBoost, AdaBoost
- Hard Voting: Majority vote fallback
- Final Accuracy: 90.2% (best in ensemble)
Phase 4: Deep Learning & LLMs
1D CNN for Hierarchical Learning:
- Input: 39-dimensional feature vector
- Architecture: 3 convolutional layers + max pooling + dense layers
- Accuracy: 88.1%
Transformer-Based LLMs:
- BERT: Fine-tuned on text responses (open-ended survey questions)
- BioBERT: Domain-specific pre-training on occupational health literature
- ClinicalBERT: Adapted for stress-related clinical language
- DischargeBERT: Transfer learning from hospital discharge notes
- COReBERT: Cross-domain biomedical language model
Best LLM Performance:
- ClinicalBERT: 87.5% accuracy on text-only features
- Hybrid (ML + LLM): 91.3% accuracy combining tabular + text features
Phase 5: Explainability & Interpretability
SHAP (SHapley Additive exPlanations):
- Feature importance ranking
- Individual prediction explanations
- Waterfall plots for decision transparency
Top 10 Stress Predictors:
- Workload intensity
- Job insecurity
- Lack of control over work
- Supervisor support
- Work-life balance
- Career development opportunities
- Physical work environment
- Colleague relationships
- Job satisfaction
- Compensation adequacy
LIME (Local Interpretable Model-Agnostic Explanations):
- Instance-level explanations for each prediction
- Counterfactual analysis: βWhat if workload decreased by 20%?β
Technical Implementation
Machine Learning Pipeline
# Hybrid Feature Selection
rfecv_features = RFECV(estimator=RandomForest, cv=5).fit(X_train, y_train)
anova_features = SelectKBest(f_classif, k=28).fit(X_train, y_train)
combined_features = set(rfecv_features) | set(anova_features) # 39 features
# Ensemble Model
rf = RandomForestClassifier(n_estimators=200, max_depth=15)
xgb = XGBClassifier(n_estimators=150, learning_rate=0.05)
ada = AdaBoostClassifier(n_estimators=100)
ensemble = VotingClassifier(
estimators=[('rf', rf), ('xgb', xgb), ('ada', ada)],
voting='soft'
)
ensemble.fit(X_train, y_train)
Real-Time Deployment
Flask REST API:
- Endpoint:
POST /predict - Input: JSON with 39 feature values
- Output: Stress probability + SHAP explanations
- Latency: <100ms per prediction
Gradio Web Interface:
- User-friendly form for survey input
- Real-time prediction with confidence scores
- Explainability dashboard with feature importances
- Deployed on HuggingFace Spaces
Results & Performance
Accuracy Comparison with Prior Studies
| Study | Method | Accuracy | Dataset Size |
|---|---|---|---|
| Our Work | Hybrid ML + LLM | 90.2% | 2,847 |
| Prior Study 1 | SVM | 82.4% | 1,200 |
| Prior Study 2 | Random Forest | 84.1% | 1,500 |
| Prior Study 3 | Neural Network | 81.9% | 900 |
| Prior Study 4 | Logistic Regression | 79.3% | 2,100 |
| Prior Study 5 | Decision Tree | 76.8% | 1,800 |
| Prior Study 6 | Naive Bayes | 75.2% | 1,400 |
| Prior Study 7 | KNN | 77.6% | 1,100 |
| Prior Study 8 | Ensemble | 85.7% | 2,000 |
Improvement: +5.5% F1-score over best prior work
Generalizability Testing
4 Synthetic Data Generators:
- SMOTE (Synthetic Minority Over-sampling)
- ADASYN (Adaptive Synthetic Sampling)
- GAN (Generative Adversarial Network)
- VAE (Variational Autoencoder)
Cross-Validation Results:
- Average accuracy on synthetic data: 89.1%
- Maintained >87% on all 4 generators
- Robust to distribution shift
Deployment Metrics
| Metric | Value |
|---|---|
| Accuracy | 90.2% |
| Precision | 91.1% |
| Recall | 89.3% |
| F1-Score | 90.2% |
| AUC-ROC | 0.956 |
| Inference Latency | <100ms |
| Model Size | 45 MB |
| Uptime | 99.7% |
Technical Stack
Machine Learning: Scikit-learn, XGBoost, LightGBM, CatBoost
Deep Learning: PyTorch, TensorFlow, Keras
NLP & LLMs: HuggingFace Transformers, BERT, BioBERT, ClinicalBERT
Explainability: SHAP, LIME, ELI5
Data Processing: Pandas, NumPy, SciPy
Deployment: Flask, FastAPI, Gradio, HuggingFace Spaces
Cloud: AWS EC2 for model serving
Visualization: Matplotlib, Seaborn, Plotly
Impact & Applications
Workplace Safety
β
Early Intervention: Identify at-risk employees before burnout
β
Resource Allocation: Prioritize mental health support
β
Policy Making: Data-driven workplace wellness programs
β
Cost Reduction: Prevent absenteeism and turnover ($300B annually in US)
Industry Deployment
- Manufacturing: Real-time monitoring in safety-critical environments
- Healthcare: Nurse and physician burnout prevention
- Tech Companies: Employee wellness dashboards
- Education: Teacher stress assessment
Publication & Recognition
π Citation:
@article{hasan2025early,
title={Early detection of occupational stress: Enhancing workplace safety with machine learning and large language models},
author={Hasan, Mohammad Junayed and Sultana, Jannat and Ahmed, Silvia and Momen, Sifat},
journal={PLoS ONE},
volume={20},
number={6},
pages={e0323265},
year={2025},
publisher={Public Library of Science}
}
π NeurIPS 2025 WiML Workshop:
- Presented enhanced occupational stress detection at Women in Machine Learning Workshop
- Poster session with 500+ attendees
Future Work
- Wearable Integration: Combine physiological signals (heart rate, GSR) with survey data
- Longitudinal Tracking: Monitor stress trends over time
- Personalized Interventions: Tailored stress reduction recommendations
- Multi-Language Support: Extend to non-English speaking workplaces
- Real-Time Monitoring: IoT sensors for continuous stress assessment
- Causal Inference: Identify root causes, not just correlations
Status: Published & Deployed
Journal: PLoS ONE (Q1, IF: 2.9)
GitHub: occupational-stress-ml
Demo: Gradio Web App (Example URL)