Comparative Analysis of Car Price Prediction Using Machine Learning and Beyond-A Data-Driven Approach
Keywords:
Car price prediction, Data analysis, Gradient boosting, Light GBM, Machine learning, Neural networksAbstract
Car price prediction plays a vital role in the automotive industry, providing valuable insights for buyers, sellers, and manufacturers to make well-informed decisions. Traditional pricing methods often depend on expert assessments, which can be subjective and inconsistent. This study explores a data-driven approach to car price estimation using machine learning techniques. By analyzing a dataset of 10,000 car records, the study examines various attributes such as brand, model, production year, engine size, fuel type, transmission type, mileage, number of doors, and ownership history. Exploratory Data Analysis (EDA) is conducted to identify key trends and relationships within the data. Three machine learning models Gradient Boosting, LightGBM, and Neural Networks are implemented and evaluated based on metrics such as accuracy, ROC-AUC scores, and cross-validation. The results reveal that Gradient Boosting outperforms the other models, achieving an accuracy of 92%. Key factors influencing car prices include production year, mileage, and transmission type. This study highlights the potential of machine learning in automating and improving price prediction processes. Future research directions include incorporating economic indicators and real-time market trends to further enhance the model’s accuracy and reliability.
References
N. Monburinon, P. Chertchom, T. Kaewkiriya, S. Rungpheung, S. Buya, and P. Boonpou, “Prediction of prices for used car by using regression models,” IEEE Xplore, May 01, 2018. https://ieeexplore.ieee.org/document/8391177
M. Listiani, R. Möller, M. Morlock, and S. Lessmann, “Support Vector Regression Analysis for Price Prediction in a Car Leasing Application,” 2009. Available: https://www.ifis.uni-luebeck.de/~moeller/publist-sts-pw-and-m/source/papers/2009/list09.pdf
D. Kshirsagar and S. Kumar, “Towards an intrusion detection system for detecting web attacks based on an ensemble of filter feature selection techniques,” Cyber-Physical Systems, pp. 1–16, Jan. 2022, doi: https://doi.org/10.1080/23335777.2021.2023651.
G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, “LightGBM: A Highly Efficient Gradient Boosting Decision Tree,” 2017. Available: https://proceedings.neurips.cc/paper_files/paper/2017/file/6449f44a102fde848669bdd9eb6b76fa-paper.pdf
K. Noor and S. Jan, “Vehicle Price Prediction System using Machine Learning Techniques,” International Journal of Computer Applications, vol. 167, no. 9, pp. 27–31, Jun. 2017, doi: https://doi.org/10.5120/ijca2017914373
A. T. Ahmed and B. T. Ahmed, “Integrating Machine Learning and Statistical Approaches for Predicting Breast Cancer Survival,” Journal of Statistics and Mathematical Engineering, 2025. https://matjournals.net/engineering/index.php/josme/article/view/1370
A. T. Ahmed, “Machine Learning in Liver Disease Detection: A Comprehensive Review,” Journal of Computer Science Engineering and Software Testing, vol. 1, no. 1, 2025, Available:https://matjournals.net/engineering/index.php/JOCSES/article/view/1477
Kaggle, “Kaggle: Your home for data science,” Kaggle.com, 2024. https://www.kaggle.com/
T. Chen and C. Guestrin, “XGBoost: a Scalable Tree Boosting System,” Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’16, pp. 785–794, 2016, doi: https://doi.org/10.1145/2939672.2939785
F. E. Ayo, L. A. Ogundele, S. Olakunle, J. B. Awotunde, and F. A. Kasali, “A hybrid correlation-based deep learning model for email spam classification using fuzzy inference system,” Decision Analytics Journal, vol. 10, p. 100390, Mar. 2024, doi: https://doi.org/10.1016/j.dajour.2023.100390.