Development of a Keyphrase Extraction Method Based on a Probabilistic Topic Model

223

Abstract

The article considers the task of topic modeling. A new method for extracting keywords has been developed based on topic modeling to analyze a collection of documents describing the goods of an online store. A comparative analysis of the basic method for extracting keywords and the proposed method was carried out. Illustrative results are presented that describe the advantages of this approach. The resulting solution can be used to simplify site navigation and search for relevant products.

General Information

Keywords: keyword extraction, topic modeling, NLP, LDA, machine learning

Journal rubric: Data Analysis

DOI: https://doi.org/10.17759/mda.2022120202

Received: 18.04.2022

Accepted:

For citation: Romanadze E.L., Sudakov V.A., Kislinsky V.G. Development of a Keyphrase Extraction Method Based on a Probabilistic Topic Model. Modelirovanie i analiz dannikh = Modelling and Data Analysis, 2022. Vol. 12, no. 2, pp. 20–33. DOI: 10.17759/mda.2022120202. (In Russ., аbstr. in Engl.)

References

  1. Augenstein, I., Das, M., Riedel, S., Vikraman, L. and McCallum, A. (2017) Semeval 2017 task 10: Scienceie – extracting keyphrases and relations from scientific publications. In Proceedings of the 11th International Workshop on Semantic Evaluation, SemEval@ACL 2017, Vancouver, Canada, August 3-4, 2017, 546–555. URL: https://doi.org/10.18653/v1/S17-2091.
  2. Apishev M.A. Effective implementation of topic modeling algorithms: dis. cand. physics and mathematics: 230401. - M., 2020. - 152 p.
  3. Vorontsov K.V. Probabilistic topic modeling: theory, models, algorithms and design BigARTM. URL: http://www.machinelearning.ru/wiki/images/d/d5/Voron17survey-artm.pdf.
  4. Vorontsov K., Potapenko A. A. Additive regularization of thematic models // Reports of the Academy of Sciences. — 2014. — Т. 456, № 3. 268-271 p.
  5. Korshunov Anton, Gomzin Andrey. Thematic modeling of natural language texts // Proceedings of the Institute for System Programming of the Russian Academy of Sciences, 2012. Т. 23. p. 215–244

Information About the Authors

Ekaterina L. Romanadze, Graduate Student, Moscow Aviation Institute (National Research University)(MAI), Moscow, Russia, ORCID: https://orcid.org/0000-0003-0351-7235, e-mail: katia_rom.97@mail.ru

Vladimir A. Sudakov, Doctor of Engineering, Professor of Department 805, Moscow Aviation Institute (MAI), Leading Researcher, Keldysh Institute of Applied Mathematics (Russian Academy of Sciences), Moscow, Russia, ORCID: https://orcid.org/0000-0002-1658-1941, e-mail: sudakov@ws-dss.com

Vadim G. Kislinsky, Researcher, Moscow Institute of Physics and Technology (National Research University) (MFTI), Moscow, Russia, ORCID: https://orcid.org/0000-0003-2000-583X

Metrics

Views

Total: 514
Previous month: 19
Current month: 9

Downloads

Total: 223
Previous month: 7
Current month: 11