Adult Income Prediction
Machine learning analysis of census data to predict individual income levels with comprehensive preprocessing and visualization
Overview
A comprehensive machine learning project analyzing the Adult Income Dataset to predict individual income levels based on census data. Showcases end-to-end ML pipeline including data preprocessing, visualization, feature engineering, and predictive modeling.
💻 GitHub: Adult-Income-Prediction-Machine-Learning
📊 Dataset: UCI Adult Census Income
🎯 Task: Binary Classification (>50K vs <=50K)
Project Highlights
Data Exploration & Visualization
- 48,842 samples with 14 features (age, education, occupation, etc.)
- Exploratory Data Analysis (EDA) with correlation matrices
- Distribution plots for numerical and categorical features
- Imbalanced class handling (76% <=50K, 24% >50K)
Data Preprocessing
- Missing Value Handling: Imputation for ‘?’ entries
- Encoding: Label encoding for ordinal, One-hot for nominal
- Scaling: StandardScaler for numerical features
- Feature Engineering: Created interaction features
Machine Learning Models
| Model | Accuracy | Precision | Recall | F1-Score |
|---|---|---|---|---|
| Random Forest | 86.2% | 0.741 | 0.623 | 0.677 |
| XGBoost | 85.7% | 0.726 | 0.615 | 0.666 |
| Logistic Regression | 84.3% | 0.685 | 0.598 | 0.638 |
| SVM | 83.9% | 0.672 | 0.589 | 0.628 |
| Naive Bayes | 80.1% | 0.615 | 0.552 | 0.582 |
Feature Importance
Top 5 Predictors:
- Education Level
- Occupation
- Age
- Hours per Week
- Marital Status
Tech Stack
ML Libraries: Scikit-learn, XGBoost, Pandas, NumPy
Visualization: Matplotlib, Seaborn, Plotly
Preprocessing: Scikit-learn preprocessing, feature_engine
Notebook: Jupyter
Key Learnings
✅ Handling real-world messy census data
✅ Dealing with class imbalance (SMOTE, class weighting)
✅ Feature engineering for categorical data
✅ Model comparison and hyperparameter tuning
✅ Interpretability with feature importance
Applications
- HR Analytics: Salary prediction models
- Market Segmentation: Income-based targeting
- Policy Making: Understanding income determinants
- Financial Services: Credit risk assessment
Status: Completed
Dataset: UCI Adult Census
License: MIT