Analysing the Ames Housing Market
Using a dataset from kaggle.com, I analysed the Ames Housing Market. The purpose of this project was threefold:
- Estimating the value of homes from fixed characteristics.
- Determine any value of changeable property characteristics unexplained by the fixed ones.
- Determine which property characteristics predict an “abnormal” sale.
Please click for the here for the full notebook.
Estimating the value of homes was first a matter of determining which fixed (un-renovatable) housing features to utilise. I then trained a regression model on the houses sold prior to 2010, and tested on the houses sold in 2010. My regression model utilised Sklearn’s Bagging Regressor and Gradient Boosting Regressor, and explained 87.5% of the variance in sale price.
In order to estimate the value of renovations, I trained a second model using the renovatable features and predicted the error from the first model. This causes the model to predict the remaining variance in sale price that the first model was unable to explain. My regression model this time utilised a Bagging Regressor with Ridge Cross Validated, and had an r2 score of 22%.

Classifying abnormal sales was less straight forward because only 7% of the data points were abnormal sales. I tried various classification models, but was unable to get fewer than 32 misclassifications out of 96. The top coefficients from my Logistic Regression model were related to porch and pool features.