Skip to main navigation Skip to search Skip to main content

Policy optimization as wasserstein gradient flows

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

21 Scopus citations

Abstract

Policy optimization is a core component of reinforcement learning (RL), and most existing RL methods directly optimize parameters of a policy based on maximizing the expected total reward, or its surrogate. Though often achieving encouraging empirical success, its underlying mathematical principle on policy-distribution optimization is unclear. We place policy optimization into the space of probability measures, and interpret it as Wasserstein gradient flows. On the probability-measure space, under specified circumstances, policy optimization becomes a convex problem in terms of distribution optimization. To make optimization feasible, we develop efficient algorithms by numerically solving the corresponding discrete gradient flows. Our technique is applicable to several RL settings, and is related to many state-of-the-art policy-optimization algorithms. Empirical results verify the effectiveness of our framework, often obtaining better performance compared to related algorithms.

Original languageEnglish
Title of host publication35th International Conference on Machine Learning, ICML 2018
EditorsAndreas Krause, Jennifer Dy
PublisherInternational Machine Learning Society (IMLS)
Pages9134-9143
Number of pages10
ISBN (Electronic)9781510867963
StatePublished - 2018
Event35th International Conference on Machine Learning, ICML 2018 - Stockholm, Sweden
Duration: Jul 10 2018Jul 15 2018

Publication series

Name35th International Conference on Machine Learning, ICML 2018
Volume13

Conference

Conference35th International Conference on Machine Learning, ICML 2018
Country/TerritorySweden
CityStockholm
Period07/10/1807/15/18

Fingerprint

Dive into the research topics of 'Policy optimization as wasserstein gradient flows'. Together they form a unique fingerprint.

Cite this