Predicting Bike Sharing Demand with AutoGluon

Introduction

Bike-sharing systems have gained popularity in urban areas, necessitating accurate demand forecasting models. This project explores the use of AutoGluon to predict bike-sharing demand by leveraging automated machine learning (AutoML) techniques. The goal was to refine predictions through exploratory data analysis (EDA), feature engineering, and hyperparameter tuning.

Initial Training

Upon submitting initial predictions, it was noted that no changes were required for the output, as all predicted values were non-negative. However, the submission file needed modifications to exclude the index column, which would otherwise cause an error on Kaggle.

Top Performing Model

The top-ranked model in all training iterations was WeightedEnsemble_L3, outperforming other models by effectively combining multiple base models for improved predictive accuracy.

Exploratory Data Analysis and Feature Creation

EDA involved checking for missing values (none were found) and conducting correlation analysis. Since no independent variables exhibited high correlation, there was no risk of multicollinearity affecting the model’s performance. Feature engineering significantly improved the model’s accuracy by transforming the date variable into categorical features such as time category, wind category, humidity category, and temperature category. This transformation led to an accuracy improvement from 1.84 to 0.6538.

Hyperparameter Tuning

Tuning hyperparameters further enhanced model accuracy, reducing the error to 0.49. Key modifications included adjusting boosting rounds, tree depth, and other XGBoost and Random Forest parameters. The tuned model demonstrated higher performance, justifying the importance of hyperparameter optimization.

Future Improvements

Given more time, additional focus would be placed on generating new features and refining hyperparameter tuning using advanced methods available in Scikit-learn. Further feature selection techniques could also improve model accuracy.

Model Performance Table

ModelHPO1HPO2HPO3Score
InitialDefaultDefaultDefault0.49
Add FeaturesDefaultDefaultTime, Wind, Humidity, Temperature Category0.49
HPOBoosting Rounds, DepthBagging FoldsLearning Rate, Subsampling, Feature Selection0.4899

Model Performance Visualization

Two line plots illustrate the improvements:

  • Model Training Scores: Displays performance trends across training runs.
  • Model Test Scores: Depicts Kaggle score progression across submissions.

Summary

This project successfully leveraged AutoGluon to predict bike-sharing demand through rigorous feature engineering and hyperparameter tuning. Despite improvements, further refinement is needed to develop a more accurate model.

Click here to see the source code

Leave a Comment

Your email address will not be published. Required fields are marked *