Modelling and Data Analysis
2025. Vol. 15, no. 4, 87–103
doi:10.17759/mda.2025150406
ISSN: 2219-3758 / 2311-9454 (online)
Optimizing web application test automation using LLM and structural HTML analysis
Abstract
Context and relevance. Modern web application development requires continuous testing, but maintaining automated tests is becoming increasingly labor-intensive due to locator instability and growing interface complexity. The emergence of Large Language Models (LLM) opens new opportunities for test creation automation, but their practical application faces challenges in processing large HTML documents and the need to create maintainable code. Objective. To develop and evaluate the effectiveness of a method for automatic generation of maintainable web application tests using LLM based on HTML structure analysis and the Page Object Model(POM) pattern. Hypotheses. Primary hypothesis: combining LLM with a two-stage generation approach and the POM pattern will enable the creation of maintainable tests, reducing development time by at least one-third (to 67% or less) while preserving code readability. Secondary hypothesis: the success rate of automatic generation will be inversely proportional to the complexity of interface components. Methods and materials. The study employed an approach based on Playwright, LLM, and a two-stage generation procedure with intermediate validation. Testing was conducted on four components of an SPA application for virtual infrastructure management. Validation of results was performed by a team of three testers who assessed the correctness and readability of generated tests. Results. The proposed method achieved high success rates in automatic test generation and substantial reduction in time costs for test creation. The two-stage procedure with intermediate validation enabled localization of a significant portion of errors at the early stage of Page Object creation. Automatically generated tests provided coverage of most required functionality while maintaining code readability. An inverse relationship between generation success and interface component complexity was confirmed: standardized interfaces demonstrated significantly higher success rates. Conclusions. The proposed method provides substantial time savings in creating a baseline test suite while maintaining quality and maintainability. The approach is recommended for early stages of feature development with expert control retained for validating critical scenarios. The method is particularly effective for projects with frequent interface changes, large volumes of regression testing, and components with standardized interfaces.
General Information
Keywords: automated testing, test generation, web application, LLM, HTML, Page Object Model, Playwright
Journal rubric: Optimization Methods
Article type: scientific article
DOI: https://doi.org/10.17759/mda.2025150406
Received 09.10.2025
Revised 25.10.2025
Accepted
Published
For citation: Titeev, A.M. (2025). Optimizing web application test automation using LLM and structural HTML analysis. Modelling and Data Analysis, 15(4), 87–103. (In Russ.). https://doi.org/10.17759/mda.2025150406
© Titeev A.M., 2025
License: CC BY-NC 4.0
References
- Aghajanyan, A., Okhonko, D., Lewis, M., Joshi, M., Xu, H., Ghosh, G., Zettlemoyer, L. (2021). Hyper-text pre-training and prompting of language model. Article. https://doi.org/10.48550/arXiv.2107.06955
- Bhatia, S., Gandhi, T., Kumar, D., Jalote, P. (2024). Unit Test Generation using Generative AI: A Comparative Performance Analysis of Autogeneration Tools. Article. https://doi.org/10.48550/arXiv.2312.10622
- Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D.M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., Amodei, D. (2020). Language Models are Few-Shot Learners. Article. https://doi.org/10.48550/arXiv.2005.14165
- Fard, A.M, Mirzaaghaei, M., Mesbah, A. (2014). Leveraging existing tests in automated test generation for web applications. Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering (ASE '14), New York, NY, USA: ACM, 67-78. https://doi.org/10.1145/2642937.2642991
- Gur, I., Nachum, O., Miao, Y., Safdari, M., Huang, A., Chowdhery, A., Narang, S., Fiedel, N., Faust, A. (2023). Understanding HTML with Large Language Models. Findings of the Association for Computational Linguistics: EMNLP 2023, 2803–2821. https://doi.org/10.18653/v1/2023.findings-emnlp.185
- Leotta, M., Clerissi, D., Ricca, F., Spadaro, C. (2013). Improving Test Suites Maintainability with the Page Object Pattern: An Industrial Case Study. IEEE Sixth International Conference on Software Testing, Verification and Validation Workshops, Luxembourg, Luxembourg, 2013, pp 108–113. https://doi.org/10.1109/ICSTW.2013.19
- Li, V., Doiron, N. (2023). Prompting Code Interpreter to Write Better Unit Tests on Quixbugs Functions. Article. https://doi.org/10.48550/arXiv.2310.00483
- Pasupat, P., Jiang, T.-S., Liu, E.Z., Guu, K., Liang, P. (2018). Mapping natural language commands to web elements. Article. https://doi.org/10.48550/arXiv.1808.09132
- Plein, L., Ouédraogo, W.C., Klein, J., Bissyandé, T.F. (2023). Automatic Generation of Test Cases Based on Bug Reports: A Feasibility Study with Large Language Models. Article. https://doi.org/10.48550/arXiv.2310.06320
- Schulhoff, S., Ilie, M., Balepur, N., Kahadze, K., Liu, A., Si, C., Li, Y., Gupta, A., Han, H., Schulhoff, S., Dulepet, P.S., Vidyadhara, S., Ki, D., Agrawal, S., Pham, C., Kroiz, G., Li, F., Tao, H., Srivastava, A., Da Costa, H., Gupta, S., Rogers, M.L., Goncearenco, I., Sarli, G., Galynker, I., Peskoff, D., Carpuat, M., White, J., Anadkat, S., Hoyle, A., Resnik, P. (2024). The Prompt Report: A Systematic Survey of Prompting Techniques. Article. https://doi.org/10.48550/arXiv.2406.06608
- Tang, Y., Liu, Z., Zhou, Z., Luo, X. (2023). ChatGPT vs SBST: A Comparative Assessment of Unit Test Suite Generation. Article. https://doi.org/10.48550/arXiv.2307.00588
- Wang, J., Huang, Y., Chen, C., Liu, Z., Wang, S., Wang, Q. (2024). Software Testing with Large Language Models: Survey, Landscape, and Vision. IEEE Transactions on Software Engineering, 50(4), 911–936. https://doi.org/10.1109/TSE.2024.3368208
- White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C. (2023). A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT. Article. https://doi.org/10.48550/arXiv.2302.11382
- Xia, C.S., Paltenghi, M., Tian, J.L., Pradel, M., Zhang, L. (2024). Fuzz4All: Universal Fuzzing with Large Language Models. ICSE '24: Proceedings of the IEEE/ACM 46th International Conference on Software Engineering(ICSE '24). Article 126, 1–13. https://doi.org/10.1145/3597503.3639121
- Yuan, Z., Liu, M., Ding, S., Wang, K., Chen, Y., Peng, X., Lou, Y. (2024). Evaluating and Improving ChatGPT for Unit Test Generation. Proceedings of the ACM on Software Engineering, 1(FSE), Article 76, 1703 - 1726. https://doi.org/10.1145/3660783
Information About the Authors
Conflict of interest
The authors declare no conflict of interest.
Metrics
Web Views
Whole time: 0
Previous month: 0
Current month: 0
PDF Downloads
Whole time: 1
Previous month: 0
Current month: 1
Total
Whole time: 1
Previous month: 0
Current month: 1