Skip to main navigation Skip to search Skip to main content

Information-Directed Policy Search in Sparse-Reward Settings via the Occupancy Information Ratio

  • Wesley A. Suttle
  • , Alec Koppel
  • , Ji Liu
  • U.S. Army Research Laboratory
  • J.P. Morgan Chase & Co.

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

This paper examines a new measure of the exploration/exploitation trade-off in reinforcement learning (RL) called the occupancy information ratio (OIR). To this end, the paper derives the Information-Directed Actor-Critic (IDAC) algorithm for solving the OIR problem, provides an overview of the rich theory underlying IDAC and related OIR policy gradient methods, and experimentally investigates the advantages of such methods. The central contribution of this paper is to provide empirical evidence that, due to the form of the OIR objective, IDAC enjoys superior performance over vanilla RL methods in sparse-reward environments.

Original languageEnglish
Title of host publication2023 57th Annual Conference on Information Sciences and Systems, CISS 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781665451819
DOIs
StatePublished - 2023
Event57th Annual Conference on Information Sciences and Systems, CISS 2023 - Baltimore, United States
Duration: Mar 22 2023Mar 24 2023

Publication series

Name2023 57th Annual Conference on Information Sciences and Systems, CISS 2023

Conference

Conference57th Annual Conference on Information Sciences and Systems, CISS 2023
Country/TerritoryUnited States
CityBaltimore
Period03/22/2303/24/23

Keywords

  • exploration vs. exploitation
  • reinforcement learning
  • sparse rewards

Fingerprint

Dive into the research topics of 'Information-Directed Policy Search in Sparse-Reward Settings via the Occupancy Information Ratio'. Together they form a unique fingerprint.

Cite this