Learning Symbolic Representations for Reinforcement Learning of Non-Markovian Behavior

Bibliographic Details
Title: Learning Symbolic Representations for Reinforcement Learning of Non-Markovian Behavior
Authors: Christoffersen, Phillip J. K., Li, Andrew C., Icarte, Rodrigo Toro, McIlraith, Sheila A.
Publication Year: 2023
Collection: Computer Science
Subject Terms: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
More Details: Many real-world reinforcement learning (RL) problems necessitate learning complex, temporally extended behavior that may only receive reward signal when the behavior is completed. If the reward-worthy behavior is known, it can be specified in terms of a non-Markovian reward function - a function that depends on aspects of the state-action history, rather than just the current state and action. Such reward functions yield sparse rewards, necessitating an inordinate number of experiences to find a policy that captures the reward-worthy pattern of behavior. Recent work has leveraged Knowledge Representation (KR) to provide a symbolic abstraction of aspects of the state that summarize reward-relevant properties of the state-action history and support learning a Markovian decomposition of the problem in terms of an automaton over the KR. Providing such a decomposition has been shown to vastly improve learning rates, especially when coupled with algorithms that exploit automaton structure. Nevertheless, such techniques rely on a priori knowledge of the KR. In this work, we explore how to automatically discover useful state abstractions that support learning automata over the state-action history. The result is an end-to-end algorithm that can learn optimal policies with significantly fewer environment samples than state-of-the-art RL on simple non-Markovian domains.
Comment: 7 pages, 2 figures, presented at KR2ML workshop at NeurIPS 2020
Document Type: Working Paper
Access URL: http://arxiv.org/abs/2301.02952
Accession Number: edsarx.2301.02952
Database: arXiv
More Details
Description not available.