On the Evolution of Return Distributions in Continuous-time Reinforcement Learning
Author: Harley Wiltzer
Publisher:
Published: 2022
Total Pages: 0
ISBN-13:
DOWNLOAD EBOOK"This thesis develops the theory of distributional reinforcement learning in the continuous-timesetting. Inspired by the literature on continuous-time reinforcement learning and optimal control, we demonstrate that existing (discrete-time) distributional reinforcement learning algorithms mayfail to converge on the correct return distributions even in very simple environments. To accountfor this, we characterize the return distributions induced by a broad class of continuous-timestochastic Markov Reward Processes, and we use this characterization to inform distributionalreinforcement learning algorithms to account for continuous-time evolution. The characterizationtakes the form of a family of partial differential equations on the space of returndistributions. Furthermore, we address the issue of the representation of arbitrary probabilitymeasures with bounded space, and in doing so we show how under a particular choice ofrepresentation, the return distributions are characterized by a set of Hamilton-Jacobi-Bellmanequations, which are ubiquitous in the optimal control literature. We then demonstrate aconstruction of a continuous-time distributional algorithm and study its convergence properties inthe policy evaluation setting. Finally, we provide an implementation using deep neural networks andevaluate its performance empirically against various benchmarks"--