A Comprehensive Review of Reinforcement Learning: From Classical Frameworks to Deep Learning Paradigms
Jinlong Guo *
School of Mechanical Engineering, North China University of Water Resources and Electric Power, Zhengzhou 450045, China.
*Author to whom correspondence should be addressed.
Abstract
Reinforcement learning (RL) is a foundational paradigm for sequential decision-making in which agents learn to select actions through interaction with an environment so as to maximise long-term utility under uncertainty. Over several decades, RL has progressed from classical, theory-driven methods grounded in dynamic programming, Monte Carlo estimation, and temporal-difference learning to modern deep reinforcement learning paradigms that integrate representation learning with scalable policy optimisation. This review provides a structured synthesis of that evolution, connecting core formulations and algorithmic lineages to contemporary method families, including value-based deep RL, policy-gradient and actor–critic approaches, model-based and planning-augmented RL, offline and data-centric RL, and multi-agent reinforcement learning. The literature selection followed a targeted review strategy, emphasizing high-quality journal sources and recent syntheses. Searches were conducted across Web of Science, Scopus, IEEE Xplore, ACM Digital Library, ScienceDirect, SpringerLink, and MDPI’s indexed journal platform using specific search strings and inclusion criteria, prioritizing peer-reviewed journal articles and authoritative surveys from 2020-2025. Beyond algorithmic categorisation, the article examines cross-cutting challenges that increasingly determine real-world viability, such as sample efficiency, exploration under sparse feedback, stability and reproducibility, distribution shift in offline settings, robustness to uncertainty and out-of-distribution conditions, and safety assurance through constraint handling and verification. Representative application domains are discussed to highlight practical deployment considerations, including the role of simulation, the need for trustworthy evaluation, and the integration of RL components into broader decision-making pipelines. By synthesising classical principles with recent advances and emphasising unifying design trade-offs, this review provides researchers and practitioners with a coherent conceptual map of the field, identify persistent bottlenecks that limit dependable deployment, and outline research directions for scalable, data-efficient, and trustworthy reinforcement learning systems.
Keywords: reinforcement learning, deep reinforcement learning, model-based reinforcement learning, offline reinforcement learning, multi-agent reinforcement learning, safe and robust RL, exploration, verification, applications