Extracting Scientific and Technical Facts from Industry Documents Based on Methods of Their Semantic-syntactic and Conceptual Analysis



Extraction of scientific and technical facts is a difficult task in terms of correctness of the obtained information. The proposed fact extraction model is based on clear ideas about the semantic structure of the text, expressed as a hierarchy of syntactic constructions of meaning units, which allows identifying interphrase relations in contacted sentences. Individual words, word combinations inherent to a particular subject area and forming its conceptual composition are used as meaning units. The procedures of phraseological, conceptual and semantic-syntactic analysis of texts are used to process the source text.

General Information

Keywords: fact extraction, semantic-syntactic analysis, semantic-syntactic analysis, conceptual analysis, semantic triad

Journal rubric: Data Analysis

Article type: scientific article

DOI: https://doi.org/10.17759/mda.2024140102

Received: 04.03.2024


For citation: Kan A.V., Kozlovskaya Y.D., Tokolova A.A. Extracting Scientific and Technical Facts from Industry Documents Based on Methods of Their Semantic-syntactic and Conceptual Analysis. Modelirovanie i analiz dannikh = Modelling and Data Analysis, 2024. Vol. 14, no. 1, pp. 27–40. DOI: 10.17759/mda.2024140102. (In Russ., аbstr. in Engl.)


  1. Curcic D. Number of Academic Papers Published Per Year // Wordsrated URL: https://wordsrated.com/number-of-academic-papers-published-per-year/#:~:text=As%20of%202022%2C%20over%205.14,5.03%20million%20papers%20were%20published. (date of reference: 09.10.2023).
  2. Belonogov G.G., Kalinin Y.P., Khoroshilov A.A. Computer linguistics and perspective information technologies. Theory and practice of building systems of automatic processing of text information. - Moscow: Izd-vo Russky Mir, 2004.
  3. Khoroshilov Al-Dr. A., Musabaev R.R., Kozlovskaya Y.D., Nikitin Y.V., Khoroshilov A.A. Automatic detection and classification of information events in mass media texts// Scientific and Technical Information. Series 2: Information processes and systems. 2020. №7. С. 27-38. DOI: 10.36535/0548-0027-2020-07-4
  4. Khoroshilov Al-Dr. A., Nikitin Y.V., Khoroshilov Al-ey. A., Budzko V.I. Automatic creation of formalized representation of semantic content of unstructured text messages of mass media and social networks // Systems of High Availability, No.3, Vol.10, 2014, pp.36-51.
  5. Kan A.V., Kozlovskaya Y.D., Kadushkin N.A., Khoroshilov Al-r A. Automatic clustering of media documents based on the analysis of their semantic content // Modeling and Data Analysis. 2020. Vol. 10. No. 3. C. 24-38. DOI: https://doi.org/10.17759/mda.2020100302
  6. Bogatyrev, M. Yu. Fact extraction from natural language texts using conceptual graph models // Izvestiya TulSU. Technical Sciences. -2016. - № 7. - Ч. 1.
  7. Khoroshilov Al-Dr. A., Kozlovskaya Ya.D., Musabaev R.R., Krasovitsky A.M., Khoroshilov Al-ey A. Determination of the tone of media messages by their conceptual analysis method// Modeling and Data Analysis. 2019. №4. DOI: 10.17759/mda.2019090405

Information About the Authors

Anna V. Kan, PhD in Engineering, Associate Professor, Institute of Moscow Aviation Institute (National Research University), Head of the Analytical Department, Federal State Budgetary Institution «National Research Center» Institute named after N.E. Zhukovsky, Moscow, Russia, ORCID: https://orcid.org/0000-0001-9410-406X, e-mail: kan_a@mail.ru

Yana D. Kozlovskaya, Student, Institute of Moscow Aviation Institute (National Research University), Moscow, Russia, ORCID: https://orcid.org/0000-0002-1780-5687, e-mail: yana_kozlovskaia@mail.ru

Alina A. Tokolova, master's student , Institute of Computer Science and Applied Mathematics, Moscow Aviation Institute (National Research University) (MAI), Moscow, Russia, e-mail: tokolovaa@gmail.com



Total: 23
Previous month: 12
Current month: 11


Total: 11
Previous month: 1
Current month: 10