Intellectual Technology of Analysis and Price Forecasting of Used Cars
DOI:
https://doi.org/10.31649/1997-9266-2019-147-6-62-72Keywords:
intellectual technology, data mining, price prediction, used car, machine learning modelsAbstract
For the profitable sale of a used car, people should not only be guided by their own or third-party experts' evaluation, but also use all other suitable resources. Such resources can serve as price prediction systems that, using the common features of a car (such as a car manufacturer, car model, mileage, fuel type, body type, etc.), are able to predict the possible price of a car. Such systems can help in decision-making not only to ordinary car dealers, but also to agencies involved in the ordering and bulk transportation of used cars from abroad. To select the key features and identify the optimal structure and parameters of the models, relevant datasets should be selected, the intelligence analysis and selection of features will be conducted, after which building of a number of machine learning models has begun, from which the optimal model was chosen by certain criteria. In order to build an information system and test the functionality of the proposed intellectual technology, two comparable datasets for used cars of the USA and Ukraine were selected. Python methods and libraries have been systematized for intelligence analysis and general recommendations for their application for the task have been formulated. The general principles of intellectual technology, which is tested on the selected datasets, are offered. In particular, a exploratory data analysis of US data was conducted and a rule for filtering anomalous, and possibly erroneous, data was substantiated. Many possible models were selected, their training was carried out and the optimal one was selected according to the R-squared criterion. The cost of the car has been predicted to an accuracy of 86.1%. A similar problem is solved for data on Ukraine. An accuracy of 85.6% was achieved. This has proven the workability of the proposed technology and has yielded useful results in practice.
References
A. Bezerra, I. Silva, L. A. Guedes, D. Silva, G. Leitão, and K. Saito, “Extracting Value from Industrial Alarms and Events: A Data-Driven Approach Based on Exploratory Data Analysis,” Sensors, 2019, no 19, issue 12, pp. 11-32.
Stefan Lessmann, and Stefan Voß, “Car resale price forecasting: The impact of regression method, private information, and heterogeneity on forecast accuracy,” International Journal of Forecasting, 2017, no 33, issue 4, pp. 864-877.
Kanwal Noor, and Sadaqat Jan, “Vehicle Price Prediction System using Machine Learning Techniques,” International Journal of Computer Applications, 2017, no 167, issue 9, pp. 27-31.
Sun, Ning & Bai, Hongxi & Geng, Yuxia & Shi, Huizhu, “Price evaluation model in second-hand car system based on BP neural network theory,” IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, 2017, pp. 431-436.
Python leads the 11 top Data Science, Machine Learning platforms: Trends and Analysis. [Electronic resource]. Available: https://www.kdnuggets.com/2019/05/poll-top-data-science-machine-learning-platforms.html .
Comprehensive Data Exploration with Python [Electronic resource]. Available: https://www.kaggle.com/pmarcelino/comprehensive-data-exploration-with-python .
Module pandas_profiling. [Electronic resource]. Available: https://pandas-profiling.github.io/pandas-profiling/docs/
Matplotlib API Overview. [Electronic resource]. Available: https://matplotlib.org/api/index.html .
A new correlation coefficient between categorical, ordinal and interval variables with Pearson characteristics. [Electronic resource]. Available: https://arxiv.org/abs/1811.11440 .
Used Cars Dataset, Vehicles listings from Craigslist. [Electronic resource]. Available: https://www.kaggle.com/austinreese/craigslist-carstrucks-data .
Supervised Learning API Overview. [Electronic resource]. Available: https://scikit-learn.org/stable/supervised_learning.html#supervised-learning .
T. Houska, P. Kraft, A. Chamorro-Chavez, and L. Breuer, SPOTting Model Parameters Using a Ready-Made Python Package. [Electronic resource]. Available: https://doi.org/10.1371/journal.pone.0145180 .
Metrics and scoring: quantifying the quality of predictions. [Electronic resource]. Available: https://scikit-learn.org/stable/modules/model_evaluation.html#r2-score .
Downloads
-
PDF (Українська)
Downloads: 469
Published
How to Cite
Issue
Section
License
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).