Modelling and Data Analysis
2020. Vol. 10, no. 3, 24–38
doi:10.17759/mda.2020100302
ISSN: 2219-3758 / 2311-9454 (online)
Automatic Clustering of Mass Media Documents Based on the Analysis of Their Semantic Content
Abstract
The article describes the solution to the problem of automatic clustering of media documents based on the analysis of their semantic analysis. The proposed solution is based on the methods of machine grammar, semantic-syntactic and conceptual analysis of texts, as well as methods for identifying the conceptual composition of a collection of documents and formalizing the semantic content of texts. The developed algorithm of the document clustering process provides for the possibility of its implementation in a fully automatic mode without prior machine learning.
General Information
Keywords: automatic clustering of documents, machine grammar, semantic-syntactic analysis of texts, conceptual analysis of texts, actual conceptual vocabulary
Journal rubric: Data Analysis
Article type: scientific article
DOI: https://doi.org/10.17759/mda.2020100302
Published
For citation: Kan, A.V., Kozlovskaya, Y.D., Kadushkin, N.A., Khoroshilov, A.A. (2020). Automatic Clustering of Mass Media Documents Based on the Analysis of Their Semantic Content. Modelling and Data Analysis, 10(3), 24–38. (In Russ.). https://doi.org/10.17759/mda.2020100302
© Kan A.V., Kozlovskaya Y.D., Kadushkin N.A., Khoroshilov A.A., 2020
License: CC BY-NC 4.0
References
- Bogatyrev M. Yu. Izvlechenie faktov iz tekstov estestvennogo yazyka s primeneniem kontseptual’nykh grafovykh modelei [Fact extraction from natural language texts with conceptual graph models]. Izvestiya TulGU. Tekhnicheskie nauki. – 2016. – № 7. – Ch. 1.
- Vinogradov A.N. [i dr.] Sovremennye tekhnologii obrabotki estestvennogo yazyka v zadachakh strategicheskogo upravleniya [Modern technologies of natural language processing in strategic management tasks]. Tekhnologicheskaya perspektiva v ramkakh evraziiskogo prostranstva: novye rynki i tochki ekonomicheskogo rosta. – SPb.:Tsentr nauchno-informatsionnykh tekhnologii “Asterion”, 2018.
- Ermakov A.E. Avtomaticheskoe izvlechenie faktov iz tekstov dos’e: opyt ustanovleniya anaforicheskikh svyazei [Elektronnyi resurs] [Automatic extraction of facts from dossier texts: an experience of establishing anaphoric connections]. Komp’yuternaya lingvistika i intellektual’nye tekhnologii: trudy Mezhdunarodnoi konferentsii «Dialog’2007». – Moscow. : Nauka, 2007.
- Khoroshilov Al-dr. A. [i dr.] Avtomaticheskoe sozdanie formalizovannogo predstavleniya smyslovogo soderzhaniya nestrukturirovannykh tekstovykh soobshchenii SMI i sotsial’nykh setei [Automatic creation of a formalized representation of the semantic content of unstructured text messages of the media and social networks]. Sistemy vysokoi dostupnosti, № 3, Vol. 10, 2014.
- Helbig Н. Knowledge representation and the semantics of natural language. – Berlin: Springer, 2006.
- Belonogov G.G., Gilyarevskii R.S., Khoroshilov A.A. Problemy avtomaticheskoi smyslovoi obrabotki tekstovoi informatsii [Problems of automatic semantic processing of text information]. Nauchno-tekhnicheskaya informatsiya. Ser. 2. Informatsionnye protsessy i sistemy / Vserossiiskii institut nauchnoi i tekhnicheskoi informatsii RAN. – 2012, № 11. – pp. 24–28.
- Ablov I.V. [i dr.] Sredstva mashinnoi grammatiki russkogo yazyka (po G.G. Belonogovu) [Means of machine grammar of the Russian language (according to G.G. Belonogov)]. Nauchno-tekhnicheskaya informatsiya. Ser. 2, № 6, 2018.
- Kalinin Yu.P., Khoroshilov Al-dr. A., Khoroshilov Al-ei. A. Sovremennye tekhnologii avtomatizirovannoi obrabotki tekstovoi informatsii [Modern technologies for automated processing of text information]. Sistemy vysokoi dostupnosti, № 2, Vol. 11, 2015.
Information About the Authors
Metrics
Web Views
Whole time: 506
Previous month: 8
Current month: 26
PDF Downloads
Whole time: 154
Previous month: 1
Current month: 2
Total
Whole time: 660
Previous month: 9
Current month: 28