Optimizing web application test automation using LLM and structural HTML analysis

A.M. Titeev

doi:10.17759/mda.2025150406

Modelling and Data Analysis
2025. Vol. 15, no. 4, 87–103
doi:10.17759/mda.2025150406
ISSN: 2219-3758 / 2311-9454 (online)

Optimizing web application test automation using LLM and structural HTML analysis

18

A.M. Titeev

Abstract

Context and relevance. Modern web application development requires continuous testing, but maintaining automated tests is becoming increasingly labor-intensive due to locator instability and growing interface complexity. The emergence of Large Language Models (LLM) opens new opportunities for test creation automation, but their practical application faces challenges in processing large HTML documents and the need to create maintainable code. Objective. To develop and evaluate the effectiveness of a method for automatic generation of maintainable web application tests using LLM based on HTML structure analysis and the Page Object Model(POM) pattern. Hypotheses. Primary hypothesis: combining LLM with a two-stage generation approach and the POM pattern will enable the creation of maintainable tests, reducing development time by at least one-third (to 67% or less) while preserving code readability. Secondary hypothesis: the success rate of automatic generation will be inversely proportional to the complexity of interface components. Methods and materials. The study employed an approach based on Playwright, LLM, and a two-stage generation procedure with intermediate validation. Testing was conducted on four components of an SPA application for virtual infrastructure management. Validation of results was performed by a team of three testers who assessed the correctness and readability of generated tests. Results. The proposed method achieved high success rates in automatic test generation and substantial reduction in time costs for test creation. The two-stage procedure with intermediate validation enabled localization of a significant portion of errors at the early stage of Page Object creation. Automatically generated tests provided coverage of most required functionality while maintaining code readability. An inverse relationship between generation success and interface component complexity was confirmed: standardized interfaces demonstrated significantly higher success rates. Conclusions. The proposed method provides substantial time savings in creating a baseline test suite while maintaining quality and maintainability. The approach is recommended for early stages of feature development with expert control retained for validating critical scenarios. The method is particularly effective for projects with frequent interface changes, large volumes of regression testing, and components with standardized interfaces.

General Information

Keywords: automated testing, test generation, web application, LLM, HTML, Page Object Model, Playwright

Journal rubric: Optimization Methods

Article type: scientific article

DOI: https://doi.org/10.17759/mda.2025150406

Received 09.10.2025

Revised 25.10.2025

Accepted 05.11.2025

Published 28.12.2025

For citation: Titeev, A.M. (2025). Optimizing web application test automation using LLM and structural HTML analysis. Modelling and Data Analysis, 15(4), 87–103. (In Russ.). https://doi.org/10.17759/mda.2025150406

License: CC BY-NC 4.0

References

Aghajanyan, A., Okhonko, D., Lewis, M., Joshi, M., Xu, H., Ghosh, G., Zettlemoyer, L. (2021). Hyper-text pre-training and prompting of language model. Article. https://doi.org/10.48550/arXiv.2107.06955
Bhatia, S., Gandhi, T., Kumar, D., Jalote, P. (2024). Unit Test Generation using Generative AI: A Comparative Performance Analysis of Autogeneration Tools. Article. https://doi.org/10.48550/arXiv.2312.10622
Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D.M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., Amodei, D. (2020). Language Models are Few-Shot Learners. Article. https://doi.org/10.48550/arXiv.2005.14165
Fard, A.M, Mirzaaghaei, M., Mesbah, A. (2014). Leveraging existing tests in automated test generation for web applications. Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering (ASE '14), New York, NY, USA: ACM, 67-78. https://doi.org/10.1145/2642937.2642991
Gur, I., Nachum, O., Miao, Y., Safdari, M., Huang, A., Chowdhery, A., Narang, S., Fiedel, N., Faust, A. (2023). Understanding HTML with Large Language Models. Findings of the Association for Computational Linguistics: EMNLP 2023, 2803–2821. https://doi.org/10.18653/v1/2023.findings-emnlp.185
Leotta, M., Clerissi, D., Ricca, F., Spadaro, C. (2013). Improving Test Suites Maintainability with the Page Object Pattern: An Industrial Case Study. IEEE Sixth International Conference on Software Testing, Verification and Validation Workshops, Luxembourg, Luxembourg, 2013, pp 108–113. https://doi.org/10.1109/ICSTW.2013.19
Li, V., Doiron, N. (2023). Prompting Code Interpreter to Write Better Unit Tests on Quixbugs Functions. Article. https://doi.org/10.48550/arXiv.2310.00483
Pasupat, P., Jiang, T.-S., Liu, E.Z., Guu, K., Liang, P. (2018). Mapping natural language commands to web elements. Article. https://doi.org/10.48550/arXiv.1808.09132
Plein, L., Ouédraogo, W.C., Klein, J., Bissyandé, T.F. (2023). Automatic Generation of Test Cases Based on Bug Reports: A Feasibility Study with Large Language Models. Article. https://doi.org/10.48550/arXiv.2310.06320
Schulhoff, S., Ilie, M., Balepur, N., Kahadze, K., Liu, A., Si, C., Li, Y., Gupta, A., Han, H., Schulhoff, S., Dulepet, P.S., Vidyadhara, S., Ki, D., Agrawal, S., Pham, C., Kroiz, G., Li, F., Tao, H., Srivastava, A., Da Costa, H., Gupta, S., Rogers, M.L., Goncearenco, I., Sarli, G., Galynker, I., Peskoff, D., Carpuat, M., White, J., Anadkat, S., Hoyle, A., Resnik, P. (2024). The Prompt Report: A Systematic Survey of Prompting Techniques. Article. https://doi.org/10.48550/arXiv.2406.06608
Tang, Y., Liu, Z., Zhou, Z., Luo, X. (2023). ChatGPT vs SBST: A Comparative Assessment of Unit Test Suite Generation. Article. https://doi.org/10.48550/arXiv.2307.00588
Wang, J., Huang, Y., Chen, C., Liu, Z., Wang, S., Wang, Q. (2024). Software Testing with Large Language Models: Survey, Landscape, and Vision. IEEE Transactions on Software Engineering, 50(4), 911–936. https://doi.org/10.1109/TSE.2024.3368208
White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C. (2023). A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT. Article. https://doi.org/10.48550/arXiv.2302.11382
Xia, C.S., Paltenghi, M., Tian, J.L., Pradel, M., Zhang, L. (2024). Fuzz4All: Universal Fuzzing with Large Language Models. ICSE '24: Proceedings of the IEEE/ACM 46th International Conference on Software Engineering(ICSE '24). Article 126, 1–13. https://doi.org/10.1145/3597503.3639121
Yuan, Z., Liu, M., Ding, S., Wang, K., Chen, Y., Peng, X., Lou, Y. (2024). Evaluating and Improving ChatGPT for Unit Test Generation. Proceedings of the ACM on Software Engineering, 1(FSE), Article 76, 1703 - 1726. https://doi.org/10.1145/3660783

Information About the Authors

Aleksandr M. Titeev, Postgraduate Student, Department of Computational Mathematics and Programming, Moscow Aviation Institute (national research university) (MAI), Moscow, Russian Federation, ORCID: https://orcid.org/0009-0003-7754-1550, e-mail: loksader@yandex.ru

Conflict of interest

The authors declare no conflict of interest.

Metrics

Web Views

Whole time: 72
Previous month: 32
Current month: 23

PDF Downloads

Whole time: 18
Previous month: 7
Current month: 7

Total

Whole time: 90
Previous month: 39
Current month: 30

PlumX

article metrics