Overview of modern reinforcement learning methods for solving problems of optimal control of dynamical systems

 
Audio is AI-generated
1

Abstract

Context and relevance. The tasks of synthesizing optimal control for nonlinear dynamic systems with constraints and uncertainties remain computationally challenging, especially in aerospace applications. Reinforcement learning is considered a practical tool for building feedback and/or accelerating planning when classical methods are difficult to apply. Objective. To systematize classes of algorithms for optimal control tasks and identify criteria for selecting an approach for a specific problem. Hypothesis. Practical applicability is ensured by correct formulation and consideration of requirements for control continuity, data, safety, and robustness; combined solutions are most effective. Methods and materials. A review and comparative analysis of families of different reinforcement learning algorithms was performed. Results. Actor-critic remains the basis for continuous control, while alternatives increase selective efficiency but are sensitive to model and data coverage errors. Conclusions. The most promising are hybrid architectures that combine reinforcement learning with basic controllers and ensure controlled compliance with constraints. The choice of method should be determined not only by quality, but also by safety, robustness, and computational cost.

General Information

Keywords: reinforcement learning, control theory, dynamical systems, cybernetics

Journal rubric: Numerical Methods

Article type: scientific article

DOI: https://doi.org/10.17759/mda.2026160108

Received 15.02.2026

Revised 20.02.2026

Accepted

Published

For citation: Panovskiy, V.N. (2026). Overview of modern reinforcement learning methods for solving problems of optimal control of dynamical systems. Modelling and Data Analysis, 16(1), 125–140. (In Russ.). https://doi.org/10.17759/mda.2026160108

© Panovskiy V.N., 2026

License: CC BY-NC 4.0

References

  1. Пантелеев А.В., Пановский В.Н. (2016). Прикладное применение интервального метода взрывов для поиска оптимального программного управления солнечным парусом. Вестник НПО им. С.А. Лавочкина, 4, 110-117.
    Panteleev A.V., Panovskiy V.N. (2016). Application of interval explosion method for generation of optimal program control of solar sail. A. Lavochkin NGO Bulletin, 4, 110-117 (In Russ).
  2. Пантелеев А.В., Бортаковский А.С. (2016) Теория управления в примерах и задачах. ИНФРА, Москва.
    Panteleev A.V., Bortakovskiy A.S.(2016) Control Theory in Examples and Problems. INFRA-M, Moscow. (In Russ).
  3. Achiam J. et al. (2017). Constrained Policy Optimization. https://doi.org/10.48550/arXiv.1705.10528
  4. Barto A.G., Sutton R.S., Anderson C.W. (1983). Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Trans. SMC. https://doi.org/10.1109/TSMC.1983.6313077
  5. Bellman R. (1957). Dynamic Programming. Princeton University Press.
  6. Bertsekas D.P. (2024). Model Predictive Control and Reinforcement Learning. https://doi.org/10.48550/arXiv.2406.00592
  7. Bianchi C. et al. (2025) Robust solar sail trajectories using proximal policy optimization. https://doi.org/10.1016/j.actaastro.2024.10.065
  8. Chua K. et al. (2018) Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models. NeurIPS. https://doi.org/10.48550/arXiv.1805.12114
  9. Fujimoto S., Meger D., Precup D. (2019). Off-Policy Deep Reinforcement Learning without Exploration. ICML. https://doi.org/10.48550/arXiv.1812.02900
  10. Haarnoja T. et al. (2018) Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. https://doi.org/10.48550/arXiv.1801.01290
  11. Ho J., Ermon S. (2016). Generative Adversarial Imitation Learning. NeurIPS. https://doi.org/10.48550/arXiv.1606.03476
  12. Kostrikov I., Nair A., Levine S. (2022). Offline Reinforcement Learning with Implicit Q-Learning. ICLR. https://doi.org/10.48550/arXiv.2110.06169
  13. Kumar A. et al. (2019). Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction. NeurIPS. https://doi.org/10.48550/arXiv.1906.00949
  14. Kumar A. et al. (2020). Conservative Q-Learning for Offline Reinforcement Learning. NeurIPS. https://doi.org/10.48550/arXiv.2006.04779
  15. Lillicrap T.P. et al. (2016). Continuous control with deep reinforcement learning. ICLR. https://doi.org/10.48550/arXiv.1509.02971
  16. Ma Z. et al. (2018). Reinforcement Learning-Based Satellite Attitude Stabilization. https://doi.org/10.3390/s18124331
  17. Mayne D.Q. (2014) Model Predictive Control: Recent Developments and Future Promise. Automatica. https://doi.org/10.1016/j.automatica.2014.10.128
  18. Mnih V. et al. (2015). Human-level control through deep reinforcement learning. Nature. https://doi.org/10.1038/nature14236
  19. Pinto L. et al. (2017). Robust Adversarial Reinforcement Learning. ICML. https://doi.org/10.48550/arXiv.1703.02702
  20. Ross S., Gordon G., Bagnell D. (2011) A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning. AISTATS. https://doi.org/10.48550/arXiv.1011.0686
  21. Schulman J. et al. (2015) Trust Region Policy Optimization. https://doi.org/10.48550/arXiv.1502.05477
  22. Schulman J. et al. (2017). Proximal Policy Optimization Algorithms. https://doi.org/10.48550/arXiv.1707.06347
  23. Sutton R.S., Barto A.G. (2018). Reinforcement Learning: An Introduction (2nd ed.). MIT Press
  24. Tobin J. et al. (2017). Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World. IROS. https://doi.org/10.48550/arXiv.1703.06907
  25. Williams R.J. (1992). Simple statistical gradient-following algorithms for connectionist reinforcement learning.

Information About the Authors

Valentin N. Panovskiy, Candidate of Science (Physics and Matematics), assistant professor of department № 805, «Mathematical Cybernetics», Institute № 8 «Computer Science and Applied Mathematics», Moscow Aviation Institute (National Research University) (MAI), Moscow, Russian Federation, ORCID: https://orcid.org/0009-0007-1708-8984, e-mail: panovskiy.v@yandex.ru

Contribution of the authors

Valentin N. Panovskiy - research ideas; detailing and structuring the review, writing and formatting the manuscript.

Conflict of interest

The author declare no conflict of interest.

Metrics

 Web Views

Whole time: 0
Previous month: 0
Current month: 0

 PDF Downloads

Whole time: 1
Previous month: 0
Current month: 1

 Total

Whole time: 1
Previous month: 0
Current month: 1