Approximate Policy Iteration with Bisimulation Metrics
Title: | Approximate Policy Iteration with Bisimulation Metrics |
---|---|
Authors: | Kemertas, Mete, Jepson, Allan |
Publication Year: | 2022 |
Collection: | Computer Science |
Subject Terms: | Computer Science - Machine Learning, Computer Science - Artificial Intelligence, I.2.6 |
More Details: | Bisimulation metrics define a distance measure between states of a Markov decision process (MDP) based on a comparison of reward sequences. Due to this property they provide theoretical guarantees in value function approximation (VFA). In this work we first prove that bisimulation and $\pi$-bisimulation metrics can be defined via a more general class of Sinkhorn distances, which unifies various state similarity metrics used in recent work. Then we describe an approximate policy iteration (API) procedure that uses a bisimulation-based discretization of the state space for VFA and prove asymptotic performance bounds. Next, we bound the difference between $\pi$-bisimulation metrics in terms of the change in the policies themselves. Based on these results, we design an API($\alpha$) procedure that employs conservative policy updates and enjoys better performance bounds than the naive API approach. We discuss how such API procedures map onto practical actor-critic methods that use bisimulation metrics for state representation learning. Lastly, we validate our theoretical results and investigate their practical implications via a controlled empirical analysis based on an implementation of bisimulation-based API for finite MDPs. Comment: Accepted to Transactions on Machine Learning Research (TMLR) |
Document Type: | Working Paper |
Access URL: | http://arxiv.org/abs/2202.02881 |
Accession Number: | edsarx.2202.02881 |
Database: | arXiv |
FullText | Text: Availability: 0 CustomLinks: – Url: http://arxiv.org/abs/2202.02881 Name: EDS - Arxiv Category: fullText Text: View this record from Arxiv MouseOverText: View this record from Arxiv – Url: https://resolver.ebsco.com/c/xy5jbn/result?sid=EBSCO:edsarx&genre=article&issn=&ISBN=&volume=&issue=&date=20220206&spage=&pages=&title=Approximate Policy Iteration with Bisimulation Metrics&atitle=Approximate%20Policy%20Iteration%20with%20Bisimulation%20Metrics&aulast=Kemertas%2C%20Mete&id=DOI: Name: Full Text Finder (for New FTF UI) (s8985755) Category: fullText Text: Find It @ SCU Libraries MouseOverText: Find It @ SCU Libraries |
---|---|
Header | DbId: edsarx DbLabel: arXiv An: edsarx.2202.02881 RelevancyScore: 1028 AccessLevel: 3 PubType: Report PubTypeId: report PreciseRelevancyScore: 1027.61315917969 |
IllustrationInfo | |
Items | – Name: Title Label: Title Group: Ti Data: Approximate Policy Iteration with Bisimulation Metrics – Name: Author Label: Authors Group: Au Data: <searchLink fieldCode="AR" term="%22Kemertas%2C+Mete%22">Kemertas, Mete</searchLink><br /><searchLink fieldCode="AR" term="%22Jepson%2C+Allan%22">Jepson, Allan</searchLink> – Name: DatePubCY Label: Publication Year Group: Date Data: 2022 – Name: Subset Label: Collection Group: HoldingsInfo Data: Computer Science – Name: Subject Label: Subject Terms Group: Su Data: <searchLink fieldCode="DE" term="%22Computer+Science+-+Machine+Learning%22">Computer Science - Machine Learning</searchLink><br /><searchLink fieldCode="DE" term="%22Computer+Science+-+Artificial+Intelligence%22">Computer Science - Artificial Intelligence</searchLink><br /><searchLink fieldCode="DE" term="%22I%2E2%2E6%22">I.2.6</searchLink> – Name: Abstract Label: Description Group: Ab Data: Bisimulation metrics define a distance measure between states of a Markov decision process (MDP) based on a comparison of reward sequences. Due to this property they provide theoretical guarantees in value function approximation (VFA). In this work we first prove that bisimulation and $\pi$-bisimulation metrics can be defined via a more general class of Sinkhorn distances, which unifies various state similarity metrics used in recent work. Then we describe an approximate policy iteration (API) procedure that uses a bisimulation-based discretization of the state space for VFA and prove asymptotic performance bounds. Next, we bound the difference between $\pi$-bisimulation metrics in terms of the change in the policies themselves. Based on these results, we design an API($\alpha$) procedure that employs conservative policy updates and enjoys better performance bounds than the naive API approach. We discuss how such API procedures map onto practical actor-critic methods that use bisimulation metrics for state representation learning. Lastly, we validate our theoretical results and investigate their practical implications via a controlled empirical analysis based on an implementation of bisimulation-based API for finite MDPs.<br />Comment: Accepted to Transactions on Machine Learning Research (TMLR) – Name: TypeDocument Label: Document Type Group: TypDoc Data: Working Paper – Name: URL Label: Access URL Group: URL Data: <link linkTarget="URL" linkTerm="http://arxiv.org/abs/2202.02881" linkWindow="_blank">http://arxiv.org/abs/2202.02881</link> – Name: AN Label: Accession Number Group: ID Data: edsarx.2202.02881 |
PLink | https://login.libproxy.scu.edu/login?url=https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&scope=site&db=edsarx&AN=edsarx.2202.02881 |
RecordInfo | BibRecord: BibEntity: Subjects: – SubjectFull: Computer Science - Machine Learning Type: general – SubjectFull: Computer Science - Artificial Intelligence Type: general – SubjectFull: I.2.6 Type: general Titles: – TitleFull: Approximate Policy Iteration with Bisimulation Metrics Type: main BibRelationships: HasContributorRelationships: – PersonEntity: Name: NameFull: Kemertas, Mete – PersonEntity: Name: NameFull: Jepson, Allan IsPartOfRelationships: – BibEntity: Dates: – D: 06 M: 02 Type: published Y: 2022 |
ResultId | 1 |