RGMComm: Return Gap Minimization via Discrete Communications in Multi-Agent Reinforcement Learning

Bibliographic Details
Title: RGMComm: Return Gap Minimization via Discrete Communications in Multi-Agent Reinforcement Learning
Authors: Chen, Jingdi, Lan, Tian, Joe-Wong, Carlee
Publication Year: 2023
Collection: Computer Science
Subject Terms: Computer Science - Artificial Intelligence
More Details: Communication is crucial for solving cooperative Multi-Agent Reinforcement Learning tasks in partially observable Markov Decision Processes. Existing works often rely on black-box methods to encode local information/features into messages shared with other agents, leading to the generation of continuous messages with high communication overhead and poor interpretability. Prior attempts at discrete communication methods generate one-hot vectors trained as part of agents' actions and use the Gumbel softmax operation for calculating message gradients, which are all heuristic designs that do not provide any quantitative guarantees on the expected return. This paper establishes an upper bound on the return gap between an ideal policy with full observability and an optimal partially observable policy with discrete communication. This result enables us to recast multi-agent communication into a novel online clustering problem over the local observations at each agent, with messages as cluster labels and the upper bound on the return gap as clustering loss. To minimize the return gap, we propose the Return-Gap-Minimization Communication (RGMComm) algorithm, which is a surprisingly simple design of discrete message generation functions and is integrated with reinforcement learning through the utilization of a novel Regularized Information Maximization loss function, which incorporates cosine-distance as the clustering metric. Evaluations show that RGMComm significantly outperforms state-of-the-art multi-agent communication baselines and can achieve nearly optimal returns with few-bit messages that are naturally interpretable.
Document Type: Working Paper
Access URL: http://arxiv.org/abs/2308.03358
Accession Number: edsarx.2308.03358
Database: arXiv
FullText Text:
  Availability: 0
CustomLinks:
  – Url: http://arxiv.org/abs/2308.03358
    Name: EDS - Arxiv
    Category: fullText
    Text: View this record from Arxiv
    MouseOverText: View this record from Arxiv
  – Url: https://resolver.ebsco.com/c/xy5jbn/result?sid=EBSCO:edsarx&genre=article&issn=&ISBN=&volume=&issue=&date=20230807&spage=&pages=&title=RGMComm: Return Gap Minimization via Discrete Communications in Multi-Agent Reinforcement Learning&atitle=RGMComm%3A%20Return%20Gap%20Minimization%20via%20Discrete%20Communications%20in%20Multi-Agent%20Reinforcement%20Learning&aulast=Chen%2C%20Jingdi&id=DOI:
    Name: Full Text Finder (for New FTF UI) (s8985755)
    Category: fullText
    Text: Find It @ SCU Libraries
    MouseOverText: Find It @ SCU Libraries
Header DbId: edsarx
DbLabel: arXiv
An: edsarx.2308.03358
RelevancyScore: 1065
AccessLevel: 3
PubType: Report
PubTypeId: report
PreciseRelevancyScore: 1065.24353027344
IllustrationInfo
Items – Name: Title
  Label: Title
  Group: Ti
  Data: RGMComm: Return Gap Minimization via Discrete Communications in Multi-Agent Reinforcement Learning
– Name: Author
  Label: Authors
  Group: Au
  Data: <searchLink fieldCode="AR" term="%22Chen%2C+Jingdi%22">Chen, Jingdi</searchLink><br /><searchLink fieldCode="AR" term="%22Lan%2C+Tian%22">Lan, Tian</searchLink><br /><searchLink fieldCode="AR" term="%22Joe-Wong%2C+Carlee%22">Joe-Wong, Carlee</searchLink>
– Name: DatePubCY
  Label: Publication Year
  Group: Date
  Data: 2023
– Name: Subset
  Label: Collection
  Group: HoldingsInfo
  Data: Computer Science
– Name: Subject
  Label: Subject Terms
  Group: Su
  Data: <searchLink fieldCode="DE" term="%22Computer+Science+-+Artificial+Intelligence%22">Computer Science - Artificial Intelligence</searchLink>
– Name: Abstract
  Label: Description
  Group: Ab
  Data: Communication is crucial for solving cooperative Multi-Agent Reinforcement Learning tasks in partially observable Markov Decision Processes. Existing works often rely on black-box methods to encode local information/features into messages shared with other agents, leading to the generation of continuous messages with high communication overhead and poor interpretability. Prior attempts at discrete communication methods generate one-hot vectors trained as part of agents' actions and use the Gumbel softmax operation for calculating message gradients, which are all heuristic designs that do not provide any quantitative guarantees on the expected return. This paper establishes an upper bound on the return gap between an ideal policy with full observability and an optimal partially observable policy with discrete communication. This result enables us to recast multi-agent communication into a novel online clustering problem over the local observations at each agent, with messages as cluster labels and the upper bound on the return gap as clustering loss. To minimize the return gap, we propose the Return-Gap-Minimization Communication (RGMComm) algorithm, which is a surprisingly simple design of discrete message generation functions and is integrated with reinforcement learning through the utilization of a novel Regularized Information Maximization loss function, which incorporates cosine-distance as the clustering metric. Evaluations show that RGMComm significantly outperforms state-of-the-art multi-agent communication baselines and can achieve nearly optimal returns with few-bit messages that are naturally interpretable.
– Name: TypeDocument
  Label: Document Type
  Group: TypDoc
  Data: Working Paper
– Name: URL
  Label: Access URL
  Group: URL
  Data: <link linkTarget="URL" linkTerm="http://arxiv.org/abs/2308.03358" linkWindow="_blank">http://arxiv.org/abs/2308.03358</link>
– Name: AN
  Label: Accession Number
  Group: ID
  Data: edsarx.2308.03358
PLink https://login.libproxy.scu.edu/login?url=https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&scope=site&db=edsarx&AN=edsarx.2308.03358
RecordInfo BibRecord:
  BibEntity:
    Subjects:
      – SubjectFull: Computer Science - Artificial Intelligence
        Type: general
    Titles:
      – TitleFull: RGMComm: Return Gap Minimization via Discrete Communications in Multi-Agent Reinforcement Learning
        Type: main
  BibRelationships:
    HasContributorRelationships:
      – PersonEntity:
          Name:
            NameFull: Chen, Jingdi
      – PersonEntity:
          Name:
            NameFull: Lan, Tian
      – PersonEntity:
          Name:
            NameFull: Joe-Wong, Carlee
    IsPartOfRelationships:
      – BibEntity:
          Dates:
            – D: 07
              M: 08
              Type: published
              Y: 2023
ResultId 1