OptimCLM
Optimizing clinical language models via knowledge distillation, pruning, and quantization (IJMI 2025)
International Journal of Medical Informatics (2025)
OptimCLM is a comprehensive framework for compressing clinical language models while maintaining near-perfect performance. By combining ensemble distillation, pruning, and quantization, we achieved dramatic reductions in model size and latency for clinical NLP tasks.
Key Results
- 22.9× Compression: Reduced model size by 95% with minimal performance loss
- 28.7× Faster: Reduced inference latency from 28.7× baseline
- 98% Performance Retention: Maintained near-original accuracy across clinical tasks
- State-of-the-Art: Achieved best results on 4 clinical NLP benchmarks
Methodology
Three-Stage Pipeline:
- Ensemble Distillation: Transfer knowledge from 32 clinical LLMs to a compact student model
- Structured Pruning: Remove redundant parameters while preserving clinical semantics
- INT8 Quantization: Compress weights and activations for edge deployment
Datasets: EHR data, clinical outcome prediction tasks
Framework: PyTorch, Hugging Face Transformers
Links
- Paper: ScienceDirect
- Code: GitHub Repository
Clinical Impact
This work enables deployment of sophisticated clinical language models in resource-constrained healthcare settings, making AI-powered clinical decision support accessible to hospitals with limited computational infrastructure.
Status: Published in International Journal of Medical Informatics (2025), Volume 195, 105764 Authors: Mohammad Junayed Hasan, Fuad Rahman, Nabeel Mohammed