Automatic Clustering of Mass Media Documents Based on the Analysis of Their Semantic Content

105

Abstract

The article describes the solution to the problem of automatic clustering of media documents based on the analysis of their semantic analysis. The proposed solution is based on the methods of machine grammar, semantic-syntactic and conceptual analysis of texts, as well as methods for identifying the conceptual composition of a collection of documents and formalizing the semantic content of texts. The developed algorithm of the document clustering process provides for the possibility of its implementation in a fully automatic mode without prior machine learning.

General Information

Keywords: automatic clustering of documents, machine grammar, semantic-syntactic analysis of texts, conceptual analysis of texts, actual conceptual vocabulary

Journal rubric: Data Analysis

Article type: scientific article

DOI: https://doi.org/10.17759/mda.2020100302

For citation: Kan A.V., Kozlovskaya Y.D., Kadushkin N.A., Khoroshilov A.A. Automatic Clustering of Mass Media Documents Based on the Analysis of Their Semantic Content. Modelirovanie i analiz dannikh = Modelling and Data Analysis, 2020. Vol. 10, no. 3, pp. 24–38. DOI: 10.17759/mda.2020100302. (In Russ., аbstr. in Engl.)

References

  1. Bogatyrev M. Yu. Izvlechenie faktov iz tekstov estestvennogo yazyka s primeneniem kontseptual’nykh grafovykh modelei [Fact extraction from natural language texts with conceptual graph models]. Izvestiya TulGU. Tekhnicheskie nauki. – 2016. – № 7. – Ch. 1.
  2. Vinogradov A.N. [i dr.] Sovremennye tekhnologii obrabotki estestvennogo yazyka v zadachakh strategicheskogo upravleniya [Modern technologies of natural language processing in strategic management tasks]. Tekhnologicheskaya perspektiva v ramkakh evraziiskogo prostranstva: novye rynki i tochki ekonomicheskogo rosta. – SPb.:Tsentr nauchno-informatsionnykh tekhnologii “Asterion”, 2018.
  3. Ermakov A.E. Avtomaticheskoe izvlechenie faktov iz tekstov dos’e: opyt ustanovleniya anaforicheskikh svyazei [Elektronnyi resurs] [Automatic extraction of facts from dossier texts: an experience of establishing anaphoric connections]. Komp’yuternaya lingvistika i intellektual’nye tekhnologii: trudy Mezhdunarodnoi konferentsii «Dialog’2007». – Moscow. : Nauka, 2007.
  4. Khoroshilov Al-dr. A. [i dr.] Avtomaticheskoe sozdanie formalizovannogo predstavleniya smyslovogo soderzhaniya nestrukturirovannykh tekstovykh soobshchenii SMI i sotsial’nykh setei [Automatic creation of a formalized representation of the semantic content of unstructured text messages of the media and social networks]. Sistemy vysokoi dostupnosti, № 3, Vol. 10, 2014.
  5. Helbig Н. Knowledge representation and the semantics of natural language. – Berlin: Springer, 2006.
  6. Belonogov G.G., Gilyarevskii R.S., Khoroshilov A.A. Problemy avtomaticheskoi smyslovoi obrabotki tekstovoi informatsii [Problems of automatic semantic processing of text information]. Nauchno-tekhnicheskaya informatsiya. Ser. 2. Informatsionnye protsessy i sistemy / Vserossiiskii institut nauchnoi i tekhnicheskoi informatsii RAN. – 2012, № 11. – pp. 24–28.
  7. Ablov I.V. [i dr.] Sredstva mashinnoi grammatiki russkogo yazyka (po G.G. Belonogovu) [Means of machine grammar of the Russian language (according to G.G. Belonogov)]. Nauchno-tekhnicheskaya informatsiya. Ser. 2, № 6, 2018.
  8. Kalinin Yu.P., Khoroshilov Al-dr. A., Khoroshilov Al-ei. A. Sovremennye tekhnologii avtomatizirovannoi obrabotki tekstovoi informatsii [Modern technologies for automated processing of text information]. Sistemy vysokoi dostupnosti, № 2, Vol. 11, 2015.

Information About the Authors

Anna V. Kan, PhD in Engineering, Associate Professor, Institute of Moscow Aviation Institute (National Research University), Head of the Analytical Department, Federal State Budgetary Institution «National Research Center» Institute named after N.E. Zhukovsky, Moscow, Russia, ORCID: https://orcid.org/0000-0001-9410-406X, e-mail: kan_a@mail.ru

Yana D. Kozlovskaya, Student, Institute of Moscow Aviation Institute (National Research University), Moscow, Russia, ORCID: https://orcid.org/0000-0002-1780-5687, e-mail: yana_kozlovskaia@mail.ru

Nikolay A. Kadushkin, Student, Institute of Moscow Aviation Institute (National Research University), Moscow, Russia, ORCID: https://orcid.org/0000-0002-0327-909X, e-mail: bbamrin@gmail.com

Aleksander A. Khoroshilov, Doctor of Engineering, Senior programmer, AO ″NPK “VT i SS”″, Moscow, Russia, ORCID: https://orcid.org/0000-0003-4885-3232, e-mail: a.a.horoshilov@mail.ru

Metrics

Views

Total: 274
Previous month: 6
Current month: 2

Downloads

Total: 105
Previous month: 0
Current month: 2