Rollout, Policy Iteration, and Distributed Reinforcement Learning

Rollout, Policy Iteration, and Distributed Reinforcement Learning

Author: Dimitri Bertsekas

Publisher: Athena Scientific

Published: 2021-08-20

Total Pages: 498

ISBN-13: 1886529078

DOWNLOAD EBOOK

The purpose of this book is to develop in greater depth some of the methods from the author's Reinforcement Learning and Optimal Control recently published textbook (Athena Scientific, 2019). In particular, we present new research, relating to systems involving multiple agents, partitioned architectures, and distributed asynchronous computation. We pay special attention to the contexts of dynamic programming/policy iteration and control theory/model predictive control. We also discuss in some detail the application of the methodology to challenging discrete/combinatorial optimization problems, such as routing, scheduling, assignment, and mixed integer programming, including the use of neural network approximations within these contexts. The book focuses on the fundamental idea of policy iteration, i.e., start from some policy, and successively generate one or more improved policies. If just one improved policy is generated, this is called rollout, which, based on broad and consistent computational experience, appears to be one of the most versatile and reliable of all reinforcement learning methods. In this book, rollout algorithms are developed for both discrete deterministic and stochastic DP problems, and the development of distributed implementations in both multiagent and multiprocessor settings, aiming to take advantage of parallelism. Approximate policy iteration is more ambitious than rollout, but it is a strictly off-line method, and it is generally far more computationally intensive. This motivates the use of parallel and distributed computation. One of the purposes of the monograph is to discuss distributed (possibly asynchronous) methods that relate to rollout and policy iteration, both in the context of an exact and an approximate implementation involving neural networks or other approximation architectures. Much of the new research is inspired by the remarkable AlphaZero chess program, where policy iteration, value and policy networks, approximate lookahead minimization, and parallel computation all play an important role.


Reinforcement Learning and Optimal Control

Reinforcement Learning and Optimal Control

Author: Dimitri Bertsekas

Publisher: Athena Scientific

Published: 2019-07-01

Total Pages: 388

ISBN-13: 1886529396

DOWNLOAD EBOOK

This book considers large and challenging multistage decision problems, which can be solved in principle by dynamic programming (DP), but their exact solution is computationally intractable. We discuss solution methods that rely on approximations to produce suboptimal policies with adequate performance. These methods are collectively known by several essentially equivalent names: reinforcement learning, approximate dynamic programming, neuro-dynamic programming. They have been at the forefront of research for the last 25 years, and they underlie, among others, the recent impressive successes of self-learning in the context of games such as chess and Go. Our subject has benefited greatly from the interplay of ideas from optimal control and from artificial intelligence, as it relates to reinforcement learning and simulation-based neural network methods. One of the aims of the book is to explore the common boundary between these two fields and to form a bridge that is accessible by workers with background in either field. Another aim is to organize coherently the broad mosaic of methods that have proved successful in practice while having a solid theoretical and/or logical foundation. This may help researchers and practitioners to find their way through the maze of competing ideas that constitute the current state of the art. This book relates to several of our other books: Neuro-Dynamic Programming (Athena Scientific, 1996), Dynamic Programming and Optimal Control (4th edition, Athena Scientific, 2017), Abstract Dynamic Programming (2nd edition, Athena Scientific, 2018), and Nonlinear Programming (Athena Scientific, 2016). However, the mathematical style of this book is somewhat different. While we provide a rigorous, albeit short, mathematical account of the theory of finite and infinite horizon dynamic programming, and some fundamental approximation methods, we rely more on intuitive explanations and less on proof-based insights. Moreover, our mathematical requirements are quite modest: calculus, a minimal use of matrix-vector algebra, and elementary probability (mathematically complicated arguments involving laws of large numbers and stochastic convergence are bypassed in favor of intuitive explanations). The book illustrates the methodology with many examples and illustrations, and uses a gradual expository approach, which proceeds along four directions: (a) From exact DP to approximate DP: We first discuss exact DP algorithms, explain why they may be difficult to implement, and then use them as the basis for approximations. (b) From finite horizon to infinite horizon problems: We first discuss finite horizon exact and approximate DP methodologies, which are intuitive and mathematically simple, and then progress to infinite horizon problems. (c) From deterministic to stochastic models: We often discuss separately deterministic and stochastic problems, since deterministic problems are simpler and offer special advantages for some of our methods. (d) From model-based to model-free implementations: We first discuss model-based implementations, and then we identify schemes that can be appropriately modified to work with a simulator. The book is related and supplemented by the companion research monograph Rollout, Policy Iteration, and Distributed Reinforcement Learning (Athena Scientific, 2020), which focuses more closely on several topics related to rollout, approximate policy iteration, multiagent problems, discrete and Bayesian optimization, and distributed computation, which are either discussed in less detail or not covered at all in the present book. The author's website contains class notes, and a series of videolectures and slides from a 2021 course at ASU, which address a selection of topics from both books.


Sample-Efficient Nonconvex Optimization Algorithms in Machine Learning and Reinforcement Learning

Sample-Efficient Nonconvex Optimization Algorithms in Machine Learning and Reinforcement Learning

Author: Pan Xu

Publisher:

Published: 2021

Total Pages: 246

ISBN-13:

DOWNLOAD EBOOK

Machine learning and reinforcement learning have achieved tremendous success in solving problems in various real-world applications. Many modern learning problems boil down to a nonconvex optimization problem, where the objective function is the average or the expectation of some loss function over a finite or infinite dataset. Solving such nonconvex optimization problems, in general, can be NP-hard. Thus one often tackles such a problem through incremental steps based on the nature and the goal of the problem: finding a first-order stationary point, finding a second-order stationary point (or a local optimum), and finding a global optimum. With the size and complexity of the machine learning datasets rapidly increasing, it has become a fundamental challenge to design efficient and scalable machine learning algorithms that can improve the performance in terms of accuracy and save computational cost in terms of sample efficiency at the same time. Though many algorithms based on stochastic gradient descent have been developed and widely studied theoretically and empirically for nonconvex optimization, it has remained an open problem whether we can achieve the optimal sample complexity for finding a first-order stationary point and for finding local optima in nonconvex optimization. In this thesis, we start with the stochastic nested variance reduced gradient (SNVRG) algorithm, which is developed based on stochastic gradient descent methods and variance reduction techniques. We prove that SNVRG achieves the near-optimal convergence rate among its type for finding a first-order stationary point of a nonconvex function. We further build algorithms to efficiently find the local optimum of a nonconvex objective function by examining the curvature information at the stationary point found by SNVRG. With the ultimate goal of finding the global optimum in nonconvex optimization, we then provide a unified framework to analyze the global convergence of stochastic gradient Langevin dynamics-based algorithms for a nonconvex objective function. In the second part of this thesis, we generalize the aforementioned sample-efficient stochastic nonconvex optimization methods to reinforcement learning problems, including policy gradient, actor-critic, and Q-learning. For these problems, we propose novel algorithms and prove that they enjoy state-of-the-art theoretical guarantees on the sample complexity. The works presented in this thesis form an incomplete collection of the recent advances and developments of sample-efficient nonconvex optimization algorithms for both machine learning and reinforcement learning.


Non-convex Optimization for Machine Learning

Non-convex Optimization for Machine Learning

Author: Prateek Jain

Publisher: Foundations and Trends in Machine Learning

Published: 2017-12-04

Total Pages: 218

ISBN-13: 9781680833683

DOWNLOAD EBOOK

Non-convex Optimization for Machine Learning takes an in-depth look at the basics of non-convex optimization with applications to machine learning. It introduces the rich literature in this area, as well as equips the reader with the tools and techniques needed to apply and analyze simple but powerful procedures for non-convex problems. Non-convex Optimization for Machine Learning is as self-contained as possible while not losing focus of the main topic of non-convex optimization techniques. The monograph initiates the discussion with entire chapters devoted to presenting a tutorial-like treatment of basic concepts in convex analysis and optimization, as well as their non-convex counterparts. The monograph concludes with a look at four interesting applications in the areas of machine learning and signal processing, and exploring how the non-convex optimization techniques introduced earlier can be used to solve these problems. The monograph also contains, for each of the topics discussed, exercises and figures designed to engage the reader, as well as extensive bibliographic notes pointing towards classical works and recent advances. Non-convex Optimization for Machine Learning can be used for a semester-length course on the basics of non-convex optimization with applications to machine learning. On the other hand, it is also possible to cherry pick individual portions, such the chapter on sparse recovery, or the EM algorithm, for inclusion in a broader course. Several courses such as those in machine learning, optimization, and signal processing may benefit from the inclusion of such topics.


Deep Reinforcement Learning for Wireless Networks

Deep Reinforcement Learning for Wireless Networks

Author: F. Richard Yu

Publisher: Springer

Published: 2019-01-17

Total Pages: 71

ISBN-13: 3030105466

DOWNLOAD EBOOK

This Springerbrief presents a deep reinforcement learning approach to wireless systems to improve system performance. Particularly, deep reinforcement learning approach is used in cache-enabled opportunistic interference alignment wireless networks and mobile social networks. Simulation results with different network parameters are presented to show the effectiveness of the proposed scheme. There is a phenomenal burst of research activities in artificial intelligence, deep reinforcement learning and wireless systems. Deep reinforcement learning has been successfully used to solve many practical problems. For example, Google DeepMind adopts this method on several artificial intelligent projects with big data (e.g., AlphaGo), and gets quite good results.. Graduate students in electrical and computer engineering, as well as computer science will find this brief useful as a study guide. Researchers, engineers, computer scientists, programmers, and policy makers will also find this brief to be a useful tool.


Distributed Optimization and Statistical Learning Via the Alternating Direction Method of Multipliers

Distributed Optimization and Statistical Learning Via the Alternating Direction Method of Multipliers

Author: Stephen Boyd

Publisher: Now Publishers Inc

Published: 2011

Total Pages: 138

ISBN-13: 160198460X

DOWNLOAD EBOOK

Surveys the theory and history of the alternating direction method of multipliers, and discusses its applications to a wide variety of statistical and machine learning problems of recent interest, including the lasso, sparse logistic regression, basis pursuit, covariance selection, support vector machines, and many others.


Distributed and Error Resilient Convex Optimization Formulations in Machine Learning

Distributed and Error Resilient Convex Optimization Formulations in Machine Learning

Author: Burak Bartan

Publisher:

Published: 2022

Total Pages:

ISBN-13:

DOWNLOAD EBOOK

Neural networks have been very successful across many domains in machine learning. Training neural networks typically requires minimizing a high-dimensional non-convex function. Stochastic gradient descent and variants are often used in practice for training neural networks. In this thesis, we describe convex optimization formulations for optimally training neural networks with polynomial activation functions. More specifically, we present semidefinite programming formulations for training neural networks with second degree polynomial activations and show that its solution provides a globally optimal solution to the original non-convex training problem. We then extend this strategy to train quantized neural networks with integer weights. We show that we can globally optimize the training loss with respect to integer weights in polynomial time via semidefinite relaxations and randomized rounding. In the second part of the thesis, we describe a distributed computing and optimization framework to train models, including our convex neural networks. The proposed second order optimization methods in this part rely on approximating the Hessian matrix via random projections. In particular, we describe how to employ randomized sketches in reducing the problem dimensions as well as preserving privacy and improving straggler resilience in asynchronous distributed systems. We present novel approximation guarantees as well as closed-form expressions for debiasing the update directions of the optimization algorithm. Finally, we establish a novel connection between randomized sketching and coded computation. The proposed approach builds on polar codes for straggler-resilient distributed computing.


First-order and Stochastic Optimization Methods for Machine Learning

First-order and Stochastic Optimization Methods for Machine Learning

Author: Guanghui Lan

Publisher: Springer Nature

Published: 2020-05-15

Total Pages: 591

ISBN-13: 3030395685

DOWNLOAD EBOOK

This book covers not only foundational materials but also the most recent progresses made during the past few years on the area of machine learning algorithms. In spite of the intensive research and development in this area, there does not exist a systematic treatment to introduce the fundamental concepts and recent progresses on machine learning algorithms, especially on those based on stochastic optimization methods, randomized algorithms, nonconvex optimization, distributed and online learning, and projection free methods. This book will benefit the broad audience in the area of machine learning, artificial intelligence and mathematical programming community by presenting these recent developments in a tutorial style, starting from the basic building blocks to the most carefully designed and complicated algorithms for machine learning.


Applicability of Deep Learning Approaches to Non-convex Optimization for Trajectory-based Policy Search

Applicability of Deep Learning Approaches to Non-convex Optimization for Trajectory-based Policy Search

Author: Robert H. Verkuil

Publisher:

Published: 2019

Total Pages: 76

ISBN-13:

DOWNLOAD EBOOK

Trajectory optimization is a powerful tool for determining good control sequences for actuating dynamical systems. In the past decade, trajectory optimization has been successfully used to train and guide policy search within deep neural networks via optimizing over many trajectories simultaneously, subject to a shared neural network policy constraint. This thesis seeks to understand how this specific formulation converges in comparison to known globally optimal policies for simple classical control systems. To do so, results from three lines of experimentation are presented. First, trajectory optimization control solutions are compared against globally optimal policies determined via value iteration on simple control tasks. Second, three systems built for parallelized, non-convex optimization across trajectories with a shared neural network constraint are described and analyzed. Finally, techniques from deep learning known to improve convergence speed and quality in non-convex optimization are studied when applied to both the shared neural networks and the trajectories used to train them.