RGMComm: Return Gap Minimization via Discrete Communications in Multi-Agent Reinforcement Learning

Bibliographic Details
Title:	RGMComm: Return Gap Minimization via Discrete Communications in Multi-Agent Reinforcement Learning
Authors:	Chen, Jingdi, Lan, Tian, Joe-Wong, Carlee
Publication Year:	2023
Collection:	Computer Science
Subject Terms:	Computer Science - Artificial Intelligence
More Details:	Communication is crucial for solving cooperative Multi-Agent Reinforcement Learning tasks in partially observable Markov Decision Processes. Existing works often rely on black-box methods to encode local information/features into messages shared with other agents, leading to the generation of continuous messages with high communication overhead and poor interpretability. Prior attempts at discrete communication methods generate one-hot vectors trained as part of agents' actions and use the Gumbel softmax operation for calculating message gradients, which are all heuristic designs that do not provide any quantitative guarantees on the expected return. This paper establishes an upper bound on the return gap between an ideal policy with full observability and an optimal partially observable policy with discrete communication. This result enables us to recast multi-agent communication into a novel online clustering problem over the local observations at each agent, with messages as cluster labels and the upper bound on the return gap as clustering loss. To minimize the return gap, we propose the Return-Gap-Minimization Communication (RGMComm) algorithm, which is a surprisingly simple design of discrete message generation functions and is integrated with reinforcement learning through the utilization of a novel Regularized Information Maximization loss function, which incorporates cosine-distance as the clustering metric. Evaluations show that RGMComm significantly outperforms state-of-the-art multi-agent communication baselines and can achieve nearly optimal returns with few-bit messages that are naturally interpretable.
Document Type:	Working Paper
Access URL:	http://arxiv.org/abs/2308.03358
Accession Number:	edsarx.2308.03358
Database:	arXiv

FullText	Text: Availability: 0 CustomLinks: – Url: http://arxiv.org/abs/2308.03358 Name: EDS - Arxiv Category: fullText Text: View this record from Arxiv MouseOverText: View this record from Arxiv – Url: https://resolver.ebsco.com/c/xy5jbn/result?sid=EBSCO:edsarx&genre=article&issn=&ISBN=&volume=&issue=&date=20230807&spage=&pages=&title=RGMComm: Return Gap Minimization via Discrete Communications in Multi-Agent Reinforcement Learning&atitle=RGMComm%3A%20Return%20Gap%20Minimization%20via%20Discrete%20Communications%20in%20Multi-Agent%20Reinforcement%20Learning&aulast=Chen%2C%20Jingdi&id=DOI: Name: Full Text Finder (for New FTF UI) (s8985755) Category: fullText Text: Find It @ SCU Libraries MouseOverText: Find It @ SCU Libraries
Header	DbId: edsarx DbLabel: arXiv An: edsarx.2308.03358 RelevancyScore: 1065 AccessLevel: 3 PubType: Report PubTypeId: report PreciseRelevancyScore: 1065.24353027344
IllustrationInfo
Items	– Name: Title Label: Title Group: Ti Data: RGMComm: Return Gap Minimization via Discrete Communications in Multi-Agent Reinforcement Learning – Name: Author Label: Authors Group: Au Data: <searchLink fieldCode="AR" term="%22Chen%2C+Jingdi%22">Chen, Jingdi</searchLink><br /><searchLink fieldCode="AR" term="%22Lan%2C+Tian%22">Lan, Tian</searchLink><br /><searchLink fieldCode="AR" term="%22Joe-Wong%2C+Carlee%22">Joe-Wong, Carlee</searchLink> – Name: DatePubCY Label: Publication Year Group: Date Data: 2023 – Name: Subset Label: Collection Group: HoldingsInfo Data: Computer Science – Name: Subject Label: Subject Terms Group: Su Data: <searchLink fieldCode="DE" term="%22Computer+Science+-+Artificial+Intelligence%22">Computer Science - Artificial Intelligence</searchLink> – Name: Abstract Label: Description Group: Ab Data: Communication is crucial for solving cooperative Multi-Agent Reinforcement Learning tasks in partially observable Markov Decision Processes. Existing works often rely on black-box methods to encode local information/features into messages shared with other agents, leading to the generation of continuous messages with high communication overhead and poor interpretability. Prior attempts at discrete communication methods generate one-hot vectors trained as part of agents' actions and use the Gumbel softmax operation for calculating message gradients, which are all heuristic designs that do not provide any quantitative guarantees on the expected return. This paper establishes an upper bound on the return gap between an ideal policy with full observability and an optimal partially observable policy with discrete communication. This result enables us to recast multi-agent communication into a novel online clustering problem over the local observations at each agent, with messages as cluster labels and the upper bound on the return gap as clustering loss. To minimize the return gap, we propose the Return-Gap-Minimization Communication (RGMComm) algorithm, which is a surprisingly simple design of discrete message generation functions and is integrated with reinforcement learning through the utilization of a novel Regularized Information Maximization loss function, which incorporates cosine-distance as the clustering metric. Evaluations show that RGMComm significantly outperforms state-of-the-art multi-agent communication baselines and can achieve nearly optimal returns with few-bit messages that are naturally interpretable. – Name: TypeDocument Label: Document Type Group: TypDoc Data: Working Paper – Name: URL Label: Access URL Group: URL Data: <link linkTarget="URL" linkTerm="http://arxiv.org/abs/2308.03358" linkWindow="_blank">http://arxiv.org/abs/2308.03358</link> – Name: AN Label: Accession Number Group: ID Data: edsarx.2308.03358
PLink	https://login.libproxy.scu.edu/login?url=https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&scope=site&db=edsarx&AN=edsarx.2308.03358
RecordInfo	BibRecord: BibEntity: Subjects: – SubjectFull: Computer Science - Artificial Intelligence Type: general Titles: – TitleFull: RGMComm: Return Gap Minimization via Discrete Communications in Multi-Agent Reinforcement Learning Type: main BibRelationships: HasContributorRelationships: – PersonEntity: Name: NameFull: Chen, Jingdi – PersonEntity: Name: NameFull: Lan, Tian – PersonEntity: Name: NameFull: Joe-Wong, Carlee IsPartOfRelationships: – BibEntity: Dates: – D: 07 M: 08 Type: published Y: 2023
ResultId	1