Modelling and Data Analysis
2025. Vol. 15, no. 4, 156–164
doi:10.17759/mda.2025150410
ISSN: 2219-3758 / 2311-9454 (online)
Semantic analysis of test responses using synthetic data generation
Abstract
Purpose. To evaluate the feasibility of using synthetic data generated by large language models for training automated classifiers of text responses in educational and professional testing. Methods. The experiment involved generating 100 response examples using LLMs, followed by text preprocessing (tokenization, stemming, TF-IDF) and training two classification models - logistic regression and RBF network, with subsequent evaluation on a test dataset. Results. The models achieved accuracy of 80% and 65-90% respectively. Systematic limitations were identified: high keywords dependency, insensitivity to semantic inversions, and contextual blindness in classification. Conclusions. The approach shows promise for developing auxiliary assessment tools, though current limitations prevent complete replacement of human evaluators. Further refinement is needed for practical implementation.
General Information
Keywords: LLM, large language models, generative AI, test automatization, text processing
Journal rubric: Short Messages
Article type: announcing
DOI: https://doi.org/10.17759/mda.2025150410
Received 24.10.2025
Revised 11.11.2025
Accepted
Published
For citation: Polyakov, B.Y. (2025). Semantic analysis of test responses using synthetic data generation. Modelling and Data Analysis, 15(4), 156–164. (In Russ.). https://doi.org/10.17759/mda.2025150410
© Polyakov B.Y., 2025
License: CC BY-NC 4.0
References
- Воронин, В.М., Курицин, С.В., Наседкина, З.А., Ицкович, М.М. (2017). Использование латентного семантического анализа как альтернативы пропозиционального анализа в исследованиях понимания текста. Гуманизация образования. 2017. №2. (с. 11-19) https://www.elibrary.ru/item.asp?id=29369554
Voronin, V.M., Kuritsin, S.V., Nasedkina, Z.A., Itstovich, M.M. (2017). Using a latent semantic analysis as alternatives of sentential analysis in studies of text understanding. Humanization of education, 2017(2), 11-19 (In Russ.) https://www.elibrary.ru/item.asp?id=29369554 - Нежников, Р.И., Марьенков, А.Н. (2024). Сравнительный анализ моделей трансформера для классификации неструктурированной текстовой информации. Прикаспийский журнал: управление и высокие технологии. 2024. №2 (66) (с. 32-38). https://www.elibrary.ru/item.asp?id=71199707
Nezhnikov, R.I., Marenkov, A.N. (2024). Comparative Analysis of Transformer Models for Classification of Unstructured Text Information. Caspian Journal: Control and High Technologies, 2024, 2 (66), 32-38. (In Russ.). https://www.elibrary.ru/item.asp?id=71199707 - Ребенок, К.В. (2024) Эффективность нейросетевых алгоритмов в автоматическом реферировании и суммаризации текста. Вестник НГУ. Серия: Информационные технологии. 2024. №1. (с. 49-61) https://doi.org/10.25205/1818-7900-2024-22-4-49-61
Rebenok, K.V. (2024). Efficiency of Neural Network Algorithms in Automatic Abstracting and Summarization Text. Vestnik NSU. Series: Information Technologies. 2024;22(4):49-61. (In Russ.) https://doi.org/10.25205/1818-7900-2024-22-4-49-61
Information About the Authors
Conflict of interest
The author declare no conflict of interest.
Ethics statement
The study was conducted using synthetic data generated by language models. As the research did not involve human participants, ethics committee approval was not required.
Metrics
Web Views
Whole time: 2
Previous month: 0
Current month: 2
PDF Downloads
Whole time: 1
Previous month: 0
Current month: 1
Total
Whole time: 3
Previous month: 0
Current month: 3