Introduction
Bike-sharing systems have gained popularity in urban areas, necessitating accurate demand forecasting models. This project explores the use of AutoGluon to predict bike-sharing demand by leveraging automated machine learning (AutoML) techniques. The goal was to refine predictions through exploratory data analysis (EDA), feature engineering, and hyperparameter tuning.
Initial Training
Upon submitting initial predictions, it was noted that no changes were required for the output, as all predicted values were non-negative. However, the submission file needed modifications to exclude the index column, which would otherwise cause an error on Kaggle.
Top Performing Model
The top-ranked model in all training iterations was WeightedEnsemble_L3, outperforming other models by effectively combining multiple base models for improved predictive accuracy.
Exploratory Data Analysis and Feature Creation
EDA involved checking for missing values (none were found) and conducting correlation analysis. Since no independent variables exhibited high correlation, there was no risk of multicollinearity affecting the model’s performance. Feature engineering significantly improved the model’s accuracy by transforming the date variable into categorical features such as time category, wind category, humidity category, and temperature category. This transformation led to an accuracy improvement from 1.84 to 0.6538.
Hyperparameter Tuning
Tuning hyperparameters further enhanced model accuracy, reducing the error to 0.49. Key modifications included adjusting boosting rounds, tree depth, and other XGBoost and Random Forest parameters. The tuned model demonstrated higher performance, justifying the importance of hyperparameter optimization.
Future Improvements
Given more time, additional focus would be placed on generating new features and refining hyperparameter tuning using advanced methods available in Scikit-learn. Further feature selection techniques could also improve model accuracy.
Model Performance Table
| Model | HPO1 | HPO2 | HPO3 | Score |
|---|---|---|---|---|
| Initial | Default | Default | Default | 0.49 |
| Add Features | Default | Default | Time, Wind, Humidity, Temperature Category | 0.49 |
| HPO | Boosting Rounds, Depth | Bagging Folds | Learning Rate, Subsampling, Feature Selection | 0.4899 |
Model Performance Visualization
Two line plots illustrate the improvements:
- Model Training Scores: Displays performance trends across training runs.
- Model Test Scores: Depicts Kaggle score progression across submissions.
Summary
This project successfully leveraged AutoGluon to predict bike-sharing demand through rigorous feature engineering and hyperparameter tuning. Despite improvements, further refinement is needed to develop a more accurate model.