Automatic Clustering of Mass Media Documents Based on the Analysis of Their Semantic Content



The article describes the solution to the problem of automatic clustering of media documents based on the analysis of their semantic analysis. The proposed solution is based on the methods of machine grammar, semantic-syntactic and conceptual analysis of texts, as well as methods for identifying the conceptual composition of a collection of documents and formalizing the semantic content of texts. The developed algorithm of the document clustering process provides for the possibility of its implementation in a fully automatic mode without prior machine learning.

Keywords: automatic clustering of documents, machine grammar, semantic-syntactic analysis of texts, conceptual analysis of texts, actual conceptual vocabulary

Information About the Authors

Anna V. Kan, Candidate of Science (Engineering), Associate Professor, Institute of Moscow Aviation Institute (National Research University), Head of the Analytical Department, Federal State Budgetary Institution «National Research Center» Institute named after N.E. Zhukovsky, Moscow, Russian Federation

Yana D. Kozlovskaya, Student, Institute of Moscow Aviation Institute (National Research University), Moscow, Russian Federation

Nikolay A. Kadushkin, Student, Institute of Moscow Aviation Institute (National Research University), Moscow, Russian Federation

Aleksander A. Khoroshilov, Doctor of Engineering, Senior programmer, AO ″NPK "VT i SS"″, Moscow, Russian Federation



