Time series forecasting using transformer architecture variants

 
Audio is AI-generated
2

Abstract

Context and relevance. Time series forecasting is critically important for decision-making, but classical methods (ARIMA) have limitations, and LSTM suffers from gradient vanishing. Specialized transformers require theoretical justification. Objective. To investigate the application of transformers for time series forecasting and determine the conditions for the effectiveness of their modifications. Hypothesis. Transformers outperform classical methods in forecasting accuracy, and architecture selection depends on data characteristics. Methods and materials. A theoretical analysis of transformer modifications was conducted, along with an experimental comparison of LSTM, baseline transformer, Autoformer, and FEDformer on an electricity consumption dataset (1 million records). Results. The baseline transformer showed the best results (RMSE = 0.5874), surpassing LSTM by 8.7%. The effectiveness of specialized architectures depends on data characteristics: for electricity consumption, the advantages of decomposition were minimal. Conclusions. Architecture selection should be based on the specifics of time series. Hybrid architectures and lightweight versions of transformers for working with limited computational resources are promising directions.

General Information

Keywords: time series forecasting, transformers, attention mechanism, machine learning, deep learning

Journal rubric: Data Analysis

Article type: scientific article

DOI: https://doi.org/10.17759/mda.2025150402

Supplemental data. Datasets аvailable from https://github.com/zhouhaoyi/ETDataset

Received 20.11.2025

Revised 23.11.2025

Accepted

Published

For citation: Orishchenko, V.A. (2025). Time series forecasting using transformer architecture variants. Modelling and Data Analysis, 15(4), 27–37. (In Russ.). https://doi.org/10.17759/mda.2025150402

© Orishchenko V.A., 2025

License: CC BY-NC 4.0

References

  1. Бокс, Дж., Дженкинс, Г., Рейнсел, Г. (1974). Анализ временных рядов: прогнозирование и контроль. М.: Мир. 406 с.
    Box, G.E.P., Jenkins, G.M., Reinsel, G.C. (1974). Time Series Analysis: Forecasting and Control. Moscow: Mir. (In Russ.)
  2. Al-Selwi, S.M., Al-Shargabi, B., Al-Qutaish, R.A., Abusham, E.E. (2024). RNN-LSTM: From Applications to Modeling Techniques and Beyond—Systematic Review. Journal of King Saud University—Computer and Information Sciences, 36(5), 102068.
  3. Clarkson, K.L., Woodruff, D.P. (2017). Low-Rank Approximation and Regression in Input Sparsity Time. Journal of the ACM, 63(6), Article 54, 1–45.
  4. Cryer, J.D. (1986). Time Series Analysis. Boston: Duxbury Press.
  5. Dehghani, M., Gouws, S., Vinyals, O., Uszkoreit, J., Kaiser, Ł. (2018). Universal Transformers. arXiv preprint arXiv:1807.03819. https://arxiv.org/abs/1807.03819
  6. DiPietro, R., Hager, G.D. (2020). Deep Learning: RNNs and LSTM. In Handbook of Medical Image Computing and Computer Assisted Intervention (pp. 503–519). Academic Press.
  7. Greff, K., Srivastava, R.K., Koutník, J., Steunebrink, B.R., Schmidhuber, J. (2016). LSTM: A Search Space Odyssey. IEEE Transactions on Neural Networks and Learning Systems, 28(10), 2222–2232. https://doi.org/10.1109/TNNLS.2016.2582924
  8. Hamilton, J.D. (2020). Time Series Analysis. Princeton: Princeton University Press.
  9. Hochreiter, S., Schmidhuber, J. (1997). Long Short-Term Memory. Neural Computation, 9(8), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
  10. Lim, B., Zohren, S. (2021). Time-series forecasting with deep learning: A survey. Philosophical Transactions of the Royal Society A, 379(2194), 20200209. https://doi.org/10.1098/rsta.2020.0209
  11. Liu, D., Wang, J., Liu, K., Wang, Y., Zhang, H. (2022). FEDformer: Frequency Enhanced Decomposed Transformer for Long-Term Series Forecasting. In Proceedings of the 39th International Conference on Machine Learning (ICML) (pp. 27268–27286).
  12. Pascanu, R., Mikolov, T., Bengio, Y. (2013). On the Difficulty of Training Recurrent Neural Networks. In Proceedings of the 30th International Conference on Machine Learning (ICML-13) (pp. 1310–1318). https://proceedings.mlr.press/v28/pascanu13.html
  13. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I. (2017). Attention Is All You Need. Advances in Neural Information Processing Systems, 30, 5998–6008. https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
  14. Wang, S.C. (2003). Artificial Neural Network. In Interdisciplinary Computing in Java Programming (pp. 81–100). Boston, MA: Springer US.
  15. Wu, H., Xu, J., Wang, J., Long, M. (2021). Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting. Advances in Neural Information Processing Systems, 34, 22410–22421.
  16. Zhou, H., Zhang, S., Peng, J., Zhang, S., Li, J., Xiong, H., Zhang, W. (2021). Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 35, № 12, pp. 11106–11115). https://ojs.aaai.org/index.php/AAAI/article/view/17325

Information About the Authors

Vitaly A. Orishchenko, master’s student, researcher, faculty of information technologies, Moscow State University of Psychology and Education, Moscow, Russian Federation, ORCID: https://orcid.org/0009-0003-6696-5147, e-mail: vitalyorischenko@gmail.com

Conflict of interest

The authors declare no conflict of interest.

Metrics

 Web Views

Whole time: 2
Previous month: 0
Current month: 2

 PDF Downloads

Whole time: 2
Previous month: 0
Current month: 2

 Total

Whole time: 4
Previous month: 0
Current month: 4