Machine Learning Algorithms for House Price Prediction: A Comparative Study
Mirali Mammadzade *
Department of Computer Science, University of Lodz, Lodz, Poland.
*Author to whom correspondence should be addressed.
Abstract
Background: Accurate house price prediction is important for supporting real estate valuation, investment planning, banking decisions, and urban development through data-driven analysis of complex housing factors.
Aims: The aim of this study is to compare the predictive performance of selected machine learning algorithms for house price prediction using the Ames Housing dataset. The study also investigates whether target transformation and feature importance analysis can improve both model performance and interpretability.
Study Design: This study follows a quantitative experimental research design based on supervised machine learning regression methods.
Place and Duration of Study: The study was conducted using the Ames Housing dataset during the period from January to March 2026.
Methodology: The dataset contained 2930 residential property records and more than 80 explanatory variables. The target variable was SalePrice. Data preprocessing included removal of identifier columns, missing value imputation, categorical encoding, numerical scaling, and logarithmic transformation of the target variable. Four machine learning models were implemented using Python and scikit-learn: Linear Regression, Ridge Regression, Decision Tree Regressor, and Random Forest Regressor. The dataset was divided into training and testing subsets using an 80:20 ratio. Model performance was evaluated using Mean Squared Error, Root Mean Squared Error, Mean Absolute Error, and coefficient of determination.
Results: The results showed that Random Forest achieved the best predictive performance among the selected models. It obtained the lowest error values and the highest R² score, indicating stronger generalization ability. Feature importance analysis revealed that overall quality, above-ground living area, garage capacity, total basement area, and construction year were among the strongest predictors of house prices. This finding aligns with previous research that demonstrates the robustness of ensemble-based methods in predictive modeling tasks.
Conclusion: The study concludes that ensemble learning methods, particularly Random Forest, are highly effective for house price prediction. The findings also show that preprocessing and target transformation are important for improving model accuracy.
Keywords: House price prediction, machine learning, regression analysis, Random Forest, scikit-learn, Ames Housing dataset, feature importance