Overview of modern reinforcement learning methods for solving problems of optimal control of dynamical systems

V.N. Panovskiy

doi:10.17759/mda.2026160108

Modelling and Data Analysis
2026. Vol. 16, no. 1, 125–140
doi:10.17759/mda.2026160108
ISSN: 2219-3758 / 2311-9454 (online)

Overview of modern reinforcement learning methods for solving problems of optimal control of dynamical systems

33

V.N. Panovskiy

Abstract

Context and relevance. The tasks of synthesizing optimal control for nonlinear dynamic systems with constraints and uncertainties remain computationally challenging, especially in aerospace applications. Reinforcement learning is considered a practical tool for building feedback and/or accelerating planning when classical methods are difficult to apply. Objective. To systematize classes of algorithms for optimal control tasks and identify criteria for selecting an approach for a specific problem. Hypothesis. Practical applicability is ensured by correct formulation and consideration of requirements for control continuity, data, safety, and robustness; combined solutions are most effective. Methods and materials. A review and comparative analysis of families of different reinforcement learning algorithms was performed. Results. Actor-critic remains the basis for continuous control, while alternatives increase selective efficiency but are sensitive to model and data coverage errors. Conclusions. The most promising are hybrid architectures that combine reinforcement learning with basic controllers and ensure controlled compliance with constraints. The choice of method should be determined not only by quality, but also by safety, robustness, and computational cost.

General Information

Keywords: reinforcement learning, control theory, dynamical systems, cybernetics

Journal rubric: Numerical Methods

Article type: scientific article

DOI: https://doi.org/10.17759/mda.2026160108

Received 15.02.2026

Revised 20.02.2026

Accepted 21.02.2026

Published 31.03.2026

For citation: Panovskiy, V.N. (2026). Overview of modern reinforcement learning methods for solving problems of optimal control of dynamical systems. Modelling and Data Analysis, 16(1), 125–140. (In Russ.). https://doi.org/10.17759/mda.2026160108

License: CC BY-NC 4.0

References

Пантелеев А.В., Пановский В.Н. (2016). Прикладное применение интервального метода взрывов для поиска оптимального программного управления солнечным парусом. Вестник НПО им. С.А. Лавочкина, 4, 110-117.
Panteleev A.V., Panovskiy V.N. (2016). Application of interval explosion method for generation of optimal program control of solar sail. A. Lavochkin NGO Bulletin, 4, 110-117 (In Russ).
Пантелеев А.В., Бортаковский А.С. (2016) Теория управления в примерах и задачах. ИНФРА-М, Москва.
Panteleev A.V., Bortakovskiy A.S.(2016) Control Theory in Examples and Problems. INFRA-M, Moscow. (In Russ).
Achiam J. et al. (2017). Constrained Policy Optimization. https://doi.org/10.48550/arXiv.1705.10528
Barto A.G., Sutton R.S., Anderson C.W. (1983). Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Trans. SMC. https://doi.org/10.1109/TSMC.1983.6313077
Bellman R. (1957). Dynamic Programming. Princeton University Press.
Bertsekas D.P. (2024). Model Predictive Control and Reinforcement Learning. https://doi.org/10.48550/arXiv.2406.00592
Bianchi C. et al. (2025) Robust solar sail trajectories using proximal policy optimization. https://doi.org/10.1016/j.actaastro.2024.10.065
Chua K. et al. (2018) Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models. NeurIPS. https://doi.org/10.48550/arXiv.1805.12114
Fujimoto S., Meger D., Precup D. (2019). Off-Policy Deep Reinforcement Learning without Exploration. ICML. https://doi.org/10.48550/arXiv.1812.02900
Haarnoja T. et al. (2018) Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. https://doi.org/10.48550/arXiv.1801.01290
Ho J., Ermon S. (2016). Generative Adversarial Imitation Learning. NeurIPS. https://doi.org/10.48550/arXiv.1606.03476
Kostrikov I., Nair A., Levine S. (2022). Offline Reinforcement Learning with Implicit Q-Learning. ICLR. https://doi.org/10.48550/arXiv.2110.06169
Kumar A. et al. (2019). Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction. NeurIPS. https://doi.org/10.48550/arXiv.1906.00949
Kumar A. et al. (2020). Conservative Q-Learning for Offline Reinforcement Learning. NeurIPS. https://doi.org/10.48550/arXiv.2006.04779
Lillicrap T.P. et al. (2016). Continuous control with deep reinforcement learning. ICLR. https://doi.org/10.48550/arXiv.1509.02971
Ma Z. et al. (2018). Reinforcement Learning-Based Satellite Attitude Stabilization. https://doi.org/10.3390/s18124331
Mayne D.Q. (2014) Model Predictive Control: Recent Developments and Future Promise. Automatica. https://doi.org/10.1016/j.automatica.2014.10.128
Mnih V. et al. (2015). Human-level control through deep reinforcement learning. Nature. https://doi.org/10.1038/nature14236
Pinto L. et al. (2017). Robust Adversarial Reinforcement Learning. ICML. https://doi.org/10.48550/arXiv.1703.02702
Ross S., Gordon G., Bagnell D. (2011) A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning. AISTATS. https://doi.org/10.48550/arXiv.1011.0686
Schulman J. et al. (2015) Trust Region Policy Optimization. https://doi.org/10.48550/arXiv.1502.05477
Schulman J. et al. (2017). Proximal Policy Optimization Algorithms. https://doi.org/10.48550/arXiv.1707.06347
Sutton R.S., Barto A.G. (2018). Reinforcement Learning: An Introduction (2nd ed.). MIT Press
Tobin J. et al. (2017). Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World. IROS. https://doi.org/10.48550/arXiv.1703.06907
Williams R.J. (1992). Simple statistical gradient-following algorithms for connectionist reinforcement learning.

Information About the Authors

Valentin N. Panovskiy, Candidate of Science (Physics and Matematics), assistant professor of department № 805, «Mathematical Cybernetics», Institute № 8 «Computer Science and Applied Mathematics», Moscow Aviation Institute (National Research University) (MAI), Moscow, Russian Federation, ORCID: https://orcid.org/0009-0007-1708-8984, e-mail: panovskiy.v@yandex.ru

Contribution of the authors

Valentin N. Panovskiy - research ideas; detailing and structuring the review, writing and formatting the manuscript.

Conflict of interest

The author declare no conflict of interest.

Metrics

Web Views

Whole time: 80
Previous month: 35
Current month: 14

PDF Downloads

Whole time: 33
Previous month: 8
Current month: 10

Total

Whole time: 113
Previous month: 43
Current month: 24

PlumX

article metrics