Using Machine Learning Methods to Solve Problems of Forecasting the Amount and Probability of Purchase Based on E-Commerce Data

253

Abstract

The study is aimed at investigating the possibility of using machine learning methods to build models for predicting the probability of purchase and the amount of purchase by online store customers. As a sample, we used data of users transactions of the site ponpare.jp in the period from 01.07.2011 to 23.06.2012. The description and comparative analysis of the most common methods for solving similar problems are given. The metrics used to measure the results in the case of forecasting the fact and amount of the purchase are being described. The results obtained make it clear that within the framework of the problem of predicting the probability of a purchase, gradient boosting, namely its implementation of LGBMClassifier, shows the most accurate estimate. For the problem of predicting the amount of a customer’s purchase, using gradient boosting also gave the best results.

General Information

Keywords: probability and purchase amount forecast, classification, regression, data analysis, data processing, machine learning

Journal rubric: Data Analysis

DOI: https://doi.org/10.17759/mda.2020100403

For citation: Mamiev O.A., Finogenov N.A., Sologub G.B. Using Machine Learning Methods to Solve Problems of Forecasting the Amount and Probability of Purchase Based on E-Commerce Data. Modelirovanie i analiz dannikh = Modelling and Data Analysis, 2020. Vol. 10, no. 4, pp. 31–40. DOI: 10.17759/mda.2020100403. (In Russ., аbstr. in Engl.)

References

  1. Day, D., Gan, B., Gendall, P. and Esslemont, D. Predicting purchase behaviour // Marketing Bulletin. 1991. P.18–30.
  2. Starostin, V.S. and CHERNOVA, V.Y. E-commerce development in Russia: trends and prospects // The Journal of Internet Banking and Commerce. 2016.
  3. Kuhn M, Johnson K. Applied predictive modeling // New York: Springer. 2013.
  4. Glasbey, C.A. An analysis of histogram-based thresholding algorithms // CVGIP: Graphical models and image processing. 1993. P. 532–537.
  5. https://github.com/dmlc/xgboost
  6. Yang S, Zhang H. Comparison of several data mining methods in credit card default prediction // Intelligent Information Management. 2018. P. 115.
  7. Wu, H., Jiao, H., Yu, Y., Li, Z., Peng, Z., Liu, L. and Zeng, Z. Influence factors and regression model of urban housing prices based on internet open access data // Sustainability. 2018. P. 1676.
  8. Liu, L., Ji, M. and Buchroithner, M. Combining partial least squares and the gradient-boosting method for soil property retrieval using visible near-infrared shortwave infrared spectra // Remote Sensing. 2017. P. 1299.
  9. Wu, J.Y. Housing Price prediction Using Support Vector Regression. 2017.
  10. Limsombunchai, V. House price prediction: hedonic price model vs. artificial neural network // In New Zealand agricultural and resource economics society conference. 2004. P. 25–26.
  11. Li, J.Z. Monthly Housing Rent Forecast Based on LightGBM (Light Gradient Boosting) Model // International Journal of Intelligent Information and Management Science, 2018.

Information About the Authors

Oleg A. Mamiev, Moscow Aviation Institute (National Research University), Moscow, Russia, ORCID: https://orcid.org/0000-0003-1137-4019, e-mail: olegios@mail.ru

Nikita A. Finogenov, Moscow Aviation Institute (National Research University), Moscow, Russia, ORCID: https://orcid.org/0000-0001-7680-9496, e-mail: finogenov.nik@gmail.com

Gleb B. Sologub, PhD in Physics and Matematics, Associate Professor of the Department of Mathematical Cybernetics of Institute of Information Technologies and Applied Mathematics, Moscow Aviation Institute (National Research University), Moscow, Russia, ORCID: https://orcid.org/0000-0002-5657-4826, e-mail: glebsologub@ya.ru

Metrics

Views

Total: 471
Previous month: 9
Current month: 3

Downloads

Total: 253
Previous month: 5
Current month: 2