Modelling and Data Analysis
2026. Vol. 16, no. 1, 125–140
doi:10.17759/mda.2026160108
ISSN: 2219-3758 / 2311-9454 (online)
Overview of modern reinforcement learning methods for solving problems of optimal control of dynamical systems
Abstract
Context and relevance. The tasks of synthesizing optimal control for nonlinear dynamic systems with constraints and uncertainties remain computationally challenging, especially in aerospace applications. Reinforcement learning is considered a practical tool for building feedback and/or accelerating planning when classical methods are difficult to apply. Objective. To systematize classes of algorithms for optimal control tasks and identify criteria for selecting an approach for a specific problem. Hypothesis. Practical applicability is ensured by correct formulation and consideration of requirements for control continuity, data, safety, and robustness; combined solutions are most effective. Methods and materials. A review and comparative analysis of families of different reinforcement learning algorithms was performed. Results. Actor-critic remains the basis for continuous control, while alternatives increase selective efficiency but are sensitive to model and data coverage errors. Conclusions. The most promising are hybrid architectures that combine reinforcement learning with basic controllers and ensure controlled compliance with constraints. The choice of method should be determined not only by quality, but also by safety, robustness, and computational cost.
General Information
Keywords: reinforcement learning, control theory, dynamical systems, cybernetics
Journal rubric: Numerical Methods
Article type: scientific article
DOI: https://doi.org/10.17759/mda.2026160108
Received 15.02.2026
Revised 20.02.2026
Accepted
Published
For citation: Panovskiy, V.N. (2026). Overview of modern reinforcement learning methods for solving problems of optimal control of dynamical systems. Modelling and Data Analysis, 16(1), 125–140. (In Russ.). https://doi.org/10.17759/mda.2026160108
© Panovskiy V.N., 2026
License: CC BY-NC 4.0
References
- Пантелеев А.В., Пановский В.Н. (2016). Прикладное применение интервального метода взрывов для поиска оптимального программного управления солнечным парусом. Вестник НПО им. С.А. Лавочкина, 4, 110-117.
Panteleev A.V., Panovskiy V.N. (2016). Application of interval explosion method for generation of optimal program control of solar sail. A. Lavochkin NGO Bulletin, 4, 110-117 (In Russ). - Пантелеев А.В., Бортаковский А.С. (2016) Теория управления в примерах и задачах. ИНФРА-М, Москва.
Panteleev A.V., Bortakovskiy A.S.(2016) Control Theory in Examples and Problems. INFRA-M, Moscow. (In Russ). - Achiam J. et al. (2017). Constrained Policy Optimization. https://doi.org/10.48550/arXiv.1705.10528
- Barto A.G., Sutton R.S., Anderson C.W. (1983). Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Trans. SMC. https://doi.org/10.1109/TSMC.1983.6313077
- Bellman R. (1957). Dynamic Programming. Princeton University Press.
- Bertsekas D.P. (2024). Model Predictive Control and Reinforcement Learning. https://doi.org/10.48550/arXiv.2406.00592
- Bianchi C. et al. (2025) Robust solar sail trajectories using proximal policy optimization. https://doi.org/10.1016/j.actaastro.2024.10.065
- Chua K. et al. (2018) Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models. NeurIPS. https://doi.org/10.48550/arXiv.1805.12114
- Fujimoto S., Meger D., Precup D. (2019). Off-Policy Deep Reinforcement Learning without Exploration. ICML. https://doi.org/10.48550/arXiv.1812.02900
- Haarnoja T. et al. (2018) Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. https://doi.org/10.48550/arXiv.1801.01290
- Ho J., Ermon S. (2016). Generative Adversarial Imitation Learning. NeurIPS. https://doi.org/10.48550/arXiv.1606.03476
- Kostrikov I., Nair A., Levine S. (2022). Offline Reinforcement Learning with Implicit Q-Learning. ICLR. https://doi.org/10.48550/arXiv.2110.06169
- Kumar A. et al. (2019). Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction. NeurIPS. https://doi.org/10.48550/arXiv.1906.00949
- Kumar A. et al. (2020). Conservative Q-Learning for Offline Reinforcement Learning. NeurIPS. https://doi.org/10.48550/arXiv.2006.04779
- Lillicrap T.P. et al. (2016). Continuous control with deep reinforcement learning. ICLR. https://doi.org/10.48550/arXiv.1509.02971
- Ma Z. et al. (2018). Reinforcement Learning-Based Satellite Attitude Stabilization. https://doi.org/10.3390/s18124331
- Mayne D.Q. (2014) Model Predictive Control: Recent Developments and Future Promise. Automatica. https://doi.org/10.1016/j.automatica.2014.10.128
- Mnih V. et al. (2015). Human-level control through deep reinforcement learning. Nature. https://doi.org/10.1038/nature14236
- Pinto L. et al. (2017). Robust Adversarial Reinforcement Learning. ICML. https://doi.org/10.48550/arXiv.1703.02702
- Ross S., Gordon G., Bagnell D. (2011) A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning. AISTATS. https://doi.org/10.48550/arXiv.1011.0686
- Schulman J. et al. (2015) Trust Region Policy Optimization. https://doi.org/10.48550/arXiv.1502.05477
- Schulman J. et al. (2017). Proximal Policy Optimization Algorithms. https://doi.org/10.48550/arXiv.1707.06347
- Sutton R.S., Barto A.G. (2018). Reinforcement Learning: An Introduction (2nd ed.). MIT Press
- Tobin J. et al. (2017). Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World. IROS. https://doi.org/10.48550/arXiv.1703.06907
- Williams R.J. (1992). Simple statistical gradient-following algorithms for connectionist reinforcement learning.
Information About the Authors
Contribution of the authors
Valentin N. Panovskiy - research ideas; detailing and structuring the review, writing and formatting the manuscript.
Conflict of interest
The author declare no conflict of interest.
Metrics
Web Views
Whole time: 0
Previous month: 0
Current month: 0
PDF Downloads
Whole time: 1
Previous month: 0
Current month: 1
Total
Whole time: 1
Previous month: 0
Current month: 1