Prediction the Probability of Purchases Recommended Items

284

Abstract

This paper discusses various methods for improving recommendation systems. A comparative analysis of two models for solving classification problems is performed: random forest and CatBoostClassifier. The research was performed on the data of the purchase history of Ozon customers. Standard methods that are often used in recommendation systems were used. We implemented collaborative filtering methods, cosine similarity of products from customer views per site visit, and similarity of text data. To evaluate the results, we used special metrics that evaluate the quality of predictions of the first k objects from the recommendations: Mean average precision (map@K) and Recall at K (recall@k). When generating additional features based on various methods that reveal the similarity of objects, an increase in the quality of model forecasts is noted. The CatBoostClassifier model showed the best results.

General Information

Keywords: recommendation systems, machine learning, binary classification, collaborative filtering methods, cosine similarity, map@K, recall@k

Journal rubric: Data Analysis

DOI: https://doi.org/10.17759/mda.2020100402

For citation: Parfenov P.A., Timofeeva A.A., Sologub G.B., Alekseychuk A.S. Prediction the Probability of Purchases Recommended Items. Modelirovanie i analiz dannikh = Modelling and Data Analysis, 2020. Vol. 10, no. 4, pp. 17–30. DOI: 10.17759/mda.2020100402. (In Russ., аbstr. in Engl.)

References

  1. Francesco Ricci and Lior Rokach and Bracha Shapira. Introduction to Recommender Systems Handbook// Springer Science+Business Media, LLC 2011, pp. 1–10.
  2. Mizzaro Stefano. Relevance: The Whole History // journal of the american society for information science, 1997, pp. 810–820.
  3. Brent Smith and Greg Linden. Two Decades of Recommender Systems at Amazon.com // the IEEE Computer Society, 2017, pp. 10–17.
  4. Carlos A. Gomez-Uribe and Neil Hunt. The Netflix Recommender System: Algorithms, Business Value, and Innovation // ACM Transactions on Management Information Systems, Vol. 6, No. 4, Article 13, 2015, pp. 6–7.
  5. E. Pyatikop. Study of the method of collaborative filtering based on the similarity of elements // Naukovi Pratsi DonNTU vipusk 2 (18), Series “Informatika, Kibernetika TA obchislyuvalna Tehnika”, 2013, pp. 109–110.
  6. Christopher D. Manning, Prabhakar Raghavan, Heinrich schütze. Introduction to information retrieval // Publishing house “Williams”, 2011, pp. 138.
  7. G. Litova, D.Y. Khanukaeva, Basics of vector algebra, Moscow, 2009, pp. 57.
  8. Jerome H. Friedman. Greedy Function Approximation: A Gradient Boosting Machine // Technical Discussion: Foundations of TreeNet(tm), 1999. P. 39.
  9. CatBoost [Electronic resource] // URL: https://neerc.ifmo.ru/wiki/index.php?title=CatBoost
  10. GridSearchCV [Electronic resource] // Scikit-learn URL: https://scikit-learn.org/stable/modules/ generated/sklearn.model_selection.GridSearchCV.html
  11. Gunnar Schröder, Maik Thiele, Wolfgang Lehner. Setting Goals and Choosing Metrics for Recommender System Evaluations, 2011 P. 8.
  12. Ziwei Zhu, Jianling Wang, James Caverlee // Improving Top-K Recommendation via Joint Collaborative Autoencoders, IW3C2 (International World Wide Web Conference Committee), published under Creative Commons CC-BY 4 License, 2019. P. 7.
  13. CatBoost Feature Importance [Electronic resource] // catboost URL: https://catboost.ai/docs/ concepts/fstr.html#fstr
  14. Wen Zhang, Taketoshi Yoshida, Xijin Tang. A comparative study of TFIDF, LSI and multi-words for text classification // Expert Systems with Applications, 2010. 8.
  15. Tom Fawcett. An introduction to ROC analysis // Pattern Recognition Letters 27, 2006. 865.

Information About the Authors

Pavel A. Parfenov, Moscow Aviation Institute (National Research University), Russia, ORCID: https://orcid.org/0000-0001-5995-347X, e-mail: pentalbymf@mail.ru

Alena A. Timofeeva, Moscow Aviation Institute (National Research University), Moscow, Russia, ORCID: https://orcid.org/0000-0001-7043-3715, e-mail: alena195101@yandex.ru

Gleb B. Sologub, PhD in Physics and Matematics, Associate Professor of the Department of Mathematical Cybernetics of Institute of Information Technologies and Applied Mathematics, Moscow Aviation Institute (National Research University), Moscow, Russia, ORCID: https://orcid.org/0000-0002-5657-4826, e-mail: glebsologub@ya.ru

Andrey S. Alekseychuk, PhD in Physics and Matematics, Associate Professor, Department of Mathematical Cybernetics, Moscow Aviation Institute (National Research University) (MAI), Associate Professor of the Department of Digital Education, Moscow State University of Psychology and education, Moscow, Russia, ORCID: https://orcid.org/0000-0003-4167-8347, e-mail: alexejchuk@gmail.com

Metrics

Views

Total: 388
Previous month: 8
Current month: 2

Downloads

Total: 284
Previous month: 4
Current month: 4