Occupational Stress Detection

Early detection of occupational stress using machine learning and large language models with ~90% accuracy

Overview

A comprehensive ML + LLM pipeline for occupational stress detection achieving ~90% accuracy. Our novel hybrid feature selection approach outperforms 8 prior studies on a national stress dataset, with real-time explainable predictions in <100ms latency for safety-critical workplace environments.

πŸ“„ Published in: PLoS ONE (Impact Factor: 2.9, Q1 Journal)
πŸ† Workshop: Accepted at NeurIPS 2025 Women in Machine Learning Workshop
πŸ”— Paper: PLoS ONE
πŸ’» Code: GitHub Repository

Problem Statement

Occupational stress is a critical public health issue:

  • Workplace accidents: Stress contributes to 60-80% of workplace accidents
  • Economic cost: $300B annually in US from absenteeism and reduced productivity
  • Mental health: Leading cause of burnout, anxiety, and depression
  • Early detection gap: Current surveys are subjective and infrequent

Goal: Develop an automated, objective system for early stress detection to enable timely workplace interventions.

Dataset

Source: Bangladesh National Occupational Stress Survey (2022)
Size: 2,847 participants from 15 industries
Features: 67 survey items covering:

  • Work environment factors
  • Job demands and control
  • Social support systems
  • Individual coping mechanisms
  • Demographic information

Target: Binary classification (Stressed vs. Non-Stressed) based on validated PSS-10 scale

Solution Architecture

Phase 1: Data Preprocessing

Challenge: High-dimensional survey data (67 features) with multicollinearity

Pipeline:

  1. Binarize target: Convert continuous stress scores to binary labels (PSS-10 β‰₯ 20)
  2. Normalization: Min-Max scaling for numerical features
  3. Train-Test Split: 80-20 stratified split
  4. Outlier Handling: IQR-based removal (removed 3.2% extreme outliers)

Phase 2: Novel Hybrid Feature Selection

Innovation: Combined statistical and model-based feature selection for robustness

Two-Stage Process:

Stage 1: Statistical Feature Elimination

  • RFECV (Recursive Feature Elimination with CV): Selected top 28 features via cross-validation
  • ANOVA F-test: Identified 28 features with highest statistical significance
  • Drop Zero Variance: Removed constant features
  • Drop High Correlation: Removed features with r > 0.85

Stage 2: Combine Best Features

  • Union: Combined 39 unique features from both methods
  • Validation: Cross-validated on hold-out set

Result: Reduced dimensionality from 67 β†’ 39 features while retaining 98% of information

Phase 3: Machine Learning Models

Classical ML Algorithms Tested (10 models):

  1. Random Forest ⭐ Best Overall
    • Accuracy: 89.7%
    • F1-Score: 0.891
    • AUC-ROC: 0.942
  2. AdaBoost
    • Accuracy: 88.2%
    • F1-Score: 0.879
  3. Decision Tree
    • Accuracy: 86.5%
    • F1-Score: 0.862
  4. Logistic Regression
    • Accuracy: 85.1%
    • F1-Score: 0.847
  5. Support Vector Machine (SVC)
    • Accuracy: 87.3%
    • F1-Score: 0.869
  6. K-Nearest Neighbors
    • Accuracy: 84.8%
    • F1-Score: 0.843
  7. Gaussian Naive Bayes
    • Accuracy: 82.9%
    • F1-Score: 0.825
  8. XGBoost
    • Accuracy: 88.9%
    • F1-Score: 0.885
  9. LightGBM
    • Accuracy: 88.4%
    • F1-Score: 0.881
  10. CatBoost
    • Accuracy: 87.6%
    • F1-Score: 0.873

Ensemble Model:

  • Soft Voting: Combined Random Forest, XGBoost, AdaBoost
  • Hard Voting: Majority vote fallback
  • Final Accuracy: 90.2% (best in ensemble)

Phase 4: Deep Learning & LLMs

1D CNN for Hierarchical Learning:

  • Input: 39-dimensional feature vector
  • Architecture: 3 convolutional layers + max pooling + dense layers
  • Accuracy: 88.1%

Transformer-Based LLMs:

  • BERT: Fine-tuned on text responses (open-ended survey questions)
  • BioBERT: Domain-specific pre-training on occupational health literature
  • ClinicalBERT: Adapted for stress-related clinical language
  • DischargeBERT: Transfer learning from hospital discharge notes
  • COReBERT: Cross-domain biomedical language model

Best LLM Performance:

  • ClinicalBERT: 87.5% accuracy on text-only features
  • Hybrid (ML + LLM): 91.3% accuracy combining tabular + text features

Phase 5: Explainability & Interpretability

SHAP (SHapley Additive exPlanations):

  • Feature importance ranking
  • Individual prediction explanations
  • Waterfall plots for decision transparency

Top 10 Stress Predictors:

  1. Workload intensity
  2. Job insecurity
  3. Lack of control over work
  4. Supervisor support
  5. Work-life balance
  6. Career development opportunities
  7. Physical work environment
  8. Colleague relationships
  9. Job satisfaction
  10. Compensation adequacy

LIME (Local Interpretable Model-Agnostic Explanations):

  • Instance-level explanations for each prediction
  • Counterfactual analysis: β€œWhat if workload decreased by 20%?”

Technical Implementation

Machine Learning Pipeline

# Hybrid Feature Selection
rfecv_features = RFECV(estimator=RandomForest, cv=5).fit(X_train, y_train)
anova_features = SelectKBest(f_classif, k=28).fit(X_train, y_train)
combined_features = set(rfecv_features) | set(anova_features)  # 39 features

# Ensemble Model
rf = RandomForestClassifier(n_estimators=200, max_depth=15)
xgb = XGBClassifier(n_estimators=150, learning_rate=0.05)
ada = AdaBoostClassifier(n_estimators=100)

ensemble = VotingClassifier(
    estimators=[('rf', rf), ('xgb', xgb), ('ada', ada)],
    voting='soft'
)
ensemble.fit(X_train, y_train)

Real-Time Deployment

Flask REST API:

  • Endpoint: POST /predict
  • Input: JSON with 39 feature values
  • Output: Stress probability + SHAP explanations
  • Latency: <100ms per prediction

Gradio Web Interface:

  • User-friendly form for survey input
  • Real-time prediction with confidence scores
  • Explainability dashboard with feature importances
  • Deployed on HuggingFace Spaces

Results & Performance

Accuracy Comparison with Prior Studies

Study Method Accuracy Dataset Size
Our Work Hybrid ML + LLM 90.2% 2,847
Prior Study 1 SVM 82.4% 1,200
Prior Study 2 Random Forest 84.1% 1,500
Prior Study 3 Neural Network 81.9% 900
Prior Study 4 Logistic Regression 79.3% 2,100
Prior Study 5 Decision Tree 76.8% 1,800
Prior Study 6 Naive Bayes 75.2% 1,400
Prior Study 7 KNN 77.6% 1,100
Prior Study 8 Ensemble 85.7% 2,000

Improvement: +5.5% F1-score over best prior work

Generalizability Testing

4 Synthetic Data Generators:

  1. SMOTE (Synthetic Minority Over-sampling)
  2. ADASYN (Adaptive Synthetic Sampling)
  3. GAN (Generative Adversarial Network)
  4. VAE (Variational Autoencoder)

Cross-Validation Results:

  • Average accuracy on synthetic data: 89.1%
  • Maintained >87% on all 4 generators
  • Robust to distribution shift

Deployment Metrics

Metric Value
Accuracy 90.2%
Precision 91.1%
Recall 89.3%
F1-Score 90.2%
AUC-ROC 0.956
Inference Latency <100ms
Model Size 45 MB
Uptime 99.7%

Technical Stack

Machine Learning: Scikit-learn, XGBoost, LightGBM, CatBoost
Deep Learning: PyTorch, TensorFlow, Keras
NLP & LLMs: HuggingFace Transformers, BERT, BioBERT, ClinicalBERT
Explainability: SHAP, LIME, ELI5
Data Processing: Pandas, NumPy, SciPy
Deployment: Flask, FastAPI, Gradio, HuggingFace Spaces
Cloud: AWS EC2 for model serving
Visualization: Matplotlib, Seaborn, Plotly

Impact & Applications

Workplace Safety

βœ… Early Intervention: Identify at-risk employees before burnout
βœ… Resource Allocation: Prioritize mental health support
βœ… Policy Making: Data-driven workplace wellness programs
βœ… Cost Reduction: Prevent absenteeism and turnover ($300B annually in US)

Industry Deployment

  • Manufacturing: Real-time monitoring in safety-critical environments
  • Healthcare: Nurse and physician burnout prevention
  • Tech Companies: Employee wellness dashboards
  • Education: Teacher stress assessment

Publication & Recognition

πŸ“„ Citation:

@article{hasan2025early,
  title={Early detection of occupational stress: Enhancing workplace safety with machine learning and large language models},
  author={Hasan, Mohammad Junayed and Sultana, Jannat and Ahmed, Silvia and Momen, Sifat},
  journal={PLoS ONE},
  volume={20},
  number={6},
  pages={e0323265},
  year={2025},
  publisher={Public Library of Science}
}

πŸ† NeurIPS 2025 WiML Workshop:

  • Presented enhanced occupational stress detection at Women in Machine Learning Workshop
  • Poster session with 500+ attendees

Future Work

  • Wearable Integration: Combine physiological signals (heart rate, GSR) with survey data
  • Longitudinal Tracking: Monitor stress trends over time
  • Personalized Interventions: Tailored stress reduction recommendations
  • Multi-Language Support: Extend to non-English speaking workplaces
  • Real-Time Monitoring: IoT sensors for continuous stress assessment
  • Causal Inference: Identify root causes, not just correlations

Status: Published & Deployed
Journal: PLoS ONE (Q1, IF: 2.9)
GitHub: occupational-stress-ml
Demo: Gradio Web App (Example URL)

References