Occupational Stress Detection | Mohammad Junayed Hasan

Overview

A comprehensive ML + LLM pipeline for occupational stress detection achieving ~90% accuracy. Our novel hybrid feature selection approach outperforms 8 prior studies on a national stress dataset, with real-time explainable predictions in <100ms latency for safety-critical workplace environments.

📄 Published in: PLoS ONE (Impact Factor: 2.9, Q1 Journal)
🏆 Workshop: Accepted at NeurIPS 2025 Women in Machine Learning Workshop
🔗 Paper: PLoS ONE
💻 Code: GitHub Repository

Problem Statement

Occupational stress is a critical public health issue:

Workplace accidents: Stress contributes to 60-80% of workplace accidents
Economic cost: $300B annually in US from absenteeism and reduced productivity
Mental health: Leading cause of burnout, anxiety, and depression
Early detection gap: Current surveys are subjective and infrequent

Goal: Develop an automated, objective system for early stress detection to enable timely workplace interventions.

Dataset

Source: Bangladesh National Occupational Stress Survey (2022)
Size: 2,847 participants from 15 industries
Features: 67 survey items covering:

Work environment factors
Job demands and control
Social support systems
Individual coping mechanisms
Demographic information

Target: Binary classification (Stressed vs. Non-Stressed) based on validated PSS-10 scale

Solution Architecture

Phase 1: Data Preprocessing

Challenge: High-dimensional survey data (67 features) with multicollinearity

Pipeline:

Binarize target: Convert continuous stress scores to binary labels (PSS-10 ≥ 20)
Normalization: Min-Max scaling for numerical features
Train-Test Split: 80-20 stratified split
Outlier Handling: IQR-based removal (removed 3.2% extreme outliers)

Phase 2: Novel Hybrid Feature Selection

Innovation: Combined statistical and model-based feature selection for robustness

Two-Stage Process:

Stage 1: Statistical Feature Elimination

RFECV (Recursive Feature Elimination with CV): Selected top 28 features via cross-validation
ANOVA F-test: Identified 28 features with highest statistical significance
Drop Zero Variance: Removed constant features
Drop High Correlation: Removed features with r > 0.85

Stage 2: Combine Best Features

Union: Combined 39 unique features from both methods
Validation: Cross-validated on hold-out set

Result: Reduced dimensionality from 67 → 39 features while retaining 98% of information

Phase 3: Machine Learning Models

Classical ML Algorithms Tested (10 models):

Random Forest ⭐ Best Overall
- Accuracy: 89.7%
- F1-Score: 0.891
- AUC-ROC: 0.942
AdaBoost
- Accuracy: 88.2%
- F1-Score: 0.879
Decision Tree
- Accuracy: 86.5%
- F1-Score: 0.862
Logistic Regression
- Accuracy: 85.1%
- F1-Score: 0.847
Support Vector Machine (SVC)
- Accuracy: 87.3%
- F1-Score: 0.869
K-Nearest Neighbors
- Accuracy: 84.8%
- F1-Score: 0.843
Gaussian Naive Bayes
- Accuracy: 82.9%
- F1-Score: 0.825
XGBoost
- Accuracy: 88.9%
- F1-Score: 0.885
LightGBM
- Accuracy: 88.4%
- F1-Score: 0.881
CatBoost
- Accuracy: 87.6%
- F1-Score: 0.873

Ensemble Model:

Soft Voting: Combined Random Forest, XGBoost, AdaBoost
Hard Voting: Majority vote fallback
Final Accuracy: 90.2% (best in ensemble)

Phase 4: Deep Learning & LLMs

1D CNN for Hierarchical Learning:

Input: 39-dimensional feature vector
Architecture: 3 convolutional layers + max pooling + dense layers
Accuracy: 88.1%

Transformer-Based LLMs:

BERT: Fine-tuned on text responses (open-ended survey questions)
BioBERT: Domain-specific pre-training on occupational health literature
ClinicalBERT: Adapted for stress-related clinical language
DischargeBERT: Transfer learning from hospital discharge notes
COReBERT: Cross-domain biomedical language model

Best LLM Performance:

ClinicalBERT: 87.5% accuracy on text-only features
Hybrid (ML + LLM): 91.3% accuracy combining tabular + text features

Phase 5: Explainability & Interpretability

SHAP (SHapley Additive exPlanations):

Feature importance ranking
Individual prediction explanations
Waterfall plots for decision transparency

Top 10 Stress Predictors:

Workload intensity
Job insecurity
Lack of control over work
Supervisor support
Work-life balance
Career development opportunities
Physical work environment
Colleague relationships
Job satisfaction
Compensation adequacy

LIME (Local Interpretable Model-Agnostic Explanations):

Instance-level explanations for each prediction
Counterfactual analysis: “What if workload decreased by 20%?”

Technical Implementation

Machine Learning Pipeline

# Hybrid Feature Selection
rfecv_features = RFECV(estimator=RandomForest, cv=5).fit(X_train, y_train)
anova_features = SelectKBest(f_classif, k=28).fit(X_train, y_train)
combined_features = set(rfecv_features) | set(anova_features)  # 39 features

# Ensemble Model
rf = RandomForestClassifier(n_estimators=200, max_depth=15)
xgb = XGBClassifier(n_estimators=150, learning_rate=0.05)
ada = AdaBoostClassifier(n_estimators=100)

ensemble = VotingClassifier(
    estimators=[('rf', rf), ('xgb', xgb), ('ada', ada)],
    voting='soft'
)
ensemble.fit(X_train, y_train)

Real-Time Deployment

Flask REST API:

Endpoint: POST /predict
Input: JSON with 39 feature values
Output: Stress probability + SHAP explanations
Latency: <100ms per prediction

Gradio Web Interface:

User-friendly form for survey input
Real-time prediction with confidence scores
Explainability dashboard with feature importances
Deployed on HuggingFace Spaces

Results & Performance

Accuracy Comparison with Prior Studies

Study	Method	Accuracy	Dataset Size
Our Work	Hybrid ML + LLM	90.2%	2,847
Prior Study 1	SVM	82.4%	1,200
Prior Study 2	Random Forest	84.1%	1,500
Prior Study 3	Neural Network	81.9%	900
Prior Study 4	Logistic Regression	79.3%	2,100
Prior Study 5	Decision Tree	76.8%	1,800
Prior Study 6	Naive Bayes	75.2%	1,400
Prior Study 7	KNN	77.6%	1,100
Prior Study 8	Ensemble	85.7%	2,000

Improvement: +5.5% F1-score over best prior work

Generalizability Testing

4 Synthetic Data Generators:

SMOTE (Synthetic Minority Over-sampling)
ADASYN (Adaptive Synthetic Sampling)
GAN (Generative Adversarial Network)
VAE (Variational Autoencoder)

Cross-Validation Results:

Average accuracy on synthetic data: 89.1%
Maintained >87% on all 4 generators
Robust to distribution shift

Deployment Metrics

Metric	Value
Accuracy	90.2%
Precision	91.1%
Recall	89.3%
F1-Score	90.2%
AUC-ROC	0.956
Inference Latency	<100ms
Model Size	45 MB
Uptime	99.7%

Technical Stack

Machine Learning: Scikit-learn, XGBoost, LightGBM, CatBoost
Deep Learning: PyTorch, TensorFlow, Keras
NLP & LLMs: HuggingFace Transformers, BERT, BioBERT, ClinicalBERT
Explainability: SHAP, LIME, ELI5
Data Processing: Pandas, NumPy, SciPy
Deployment: Flask, FastAPI, Gradio, HuggingFace Spaces
Cloud: AWS EC2 for model serving
Visualization: Matplotlib, Seaborn, Plotly

Impact & Applications

Workplace Safety

✅ Early Intervention: Identify at-risk employees before burnout
✅ Resource Allocation: Prioritize mental health support
✅ Policy Making: Data-driven workplace wellness programs
✅ Cost Reduction: Prevent absenteeism and turnover ($300B annually in US)

Industry Deployment

Manufacturing: Real-time monitoring in safety-critical environments
Healthcare: Nurse and physician burnout prevention
Tech Companies: Employee wellness dashboards
Education: Teacher stress assessment

Publication & Recognition

📄 Citation:

@article{hasan2025early,
  title={Early detection of occupational stress: Enhancing workplace safety with machine learning and large language models},
  author={Hasan, Mohammad Junayed and Sultana, Jannat and Ahmed, Silvia and Momen, Sifat},
  journal={PLoS ONE},
  volume={20},
  number={6},
  pages={e0323265},
  year={2025},
  publisher={Public Library of Science}
}

🏆 NeurIPS 2025 WiML Workshop:

Presented enhanced occupational stress detection at Women in Machine Learning Workshop
Poster session with 500+ attendees

Future Work

Wearable Integration: Combine physiological signals (heart rate, GSR) with survey data
Longitudinal Tracking: Monitor stress trends over time
Personalized Interventions: Tailored stress reduction recommendations
Multi-Language Support: Extend to non-English speaking workplaces
Real-Time Monitoring: IoT sensors for continuous stress assessment
Causal Inference: Identify root causes, not just correlations

Status: Published & Deployed
Journal: PLoS ONE (Q1, IF: 2.9)
GitHub: occupational-stress-ml
Demo: Gradio Web App (Example URL)