Identification and Classification of Toxic Statements by Machine Learning Methods



The number of comments left on social media platforms can reach several million per day, so their owners are interested in automatic content filtering. In this paper, the task of identifying offensive statements in texts is considered. When solving the problem, various methods of vector text conversion were considered: TF-IDF, Word2Vec, Glove, etc. The results of the application of classical text classification methods and neural network methods (LSTM, CNN) were also considered and presented.

General Information

Keywords: Natural Language Processing (NLP), Classification, Gradient boosting, XGBoost, CatBoost, Recurrent Neural Network, LSTM, Convolutional Neural Network

Journal rubric: Optimization Methods

Article type: scientific article


Received: 18.01.2022


For citation: Platonov E.N., Rudenko V.Y. Identification and Classification of Toxic Statements by Machine Learning Methods. Modelirovanie i analiz dannikh = Modelling and Data Analysis, 2022. Vol. 12, no. 1, pp. 27–48. DOI: 10.17759/mda.2022120103. (In Russ., аbstr. in Engl.)


Information About the Authors

Evgeniy N. Platonov, PhD in Physics and Matematics, Assistant Professor, Moscow Aviation Institute (National Research University), Moscow, Russia, ORCID:, e-mail:

Veronika Y. Rudenko, Student of the Institute of Information Technologies and Applied Mathematics, Moscow Aviation Institute (National Research University), Moscow, Russia, ORCID:, e-mail:



