AVATAR: A Parallel Corpus for Java-Python Program Translation

Bibliographic Details
Title: AVATAR: A Parallel Corpus for Java-Python Program Translation
Authors: Ahmad, Wasi Uddin, Tushar, Md Golam Rahman, Chakraborty, Saikat, Chang, Kai-Wei
Publication Year: 2021
Collection: Computer Science
Subject Terms: Computer Science - Software Engineering, Computer Science - Computation and Language
More Details: Program translation refers to migrating source code from one programming language to another. It has tremendous practical value in software development, as porting software across languages is time-consuming and costly. Automating program translation is of paramount importance in software migration, and recently researchers explored unsupervised approaches due to the unavailability of parallel corpora. However, the availability of pre-trained language models for programming languages enables supervised fine-tuning with a small number of labeled examples. Therefore, we present AVATAR, a collection of 9,515 programming problems and their solutions written in two popular languages, Java and Python. AVATAR is collected from competitive programming sites, online platforms, and open-source repositories. Furthermore, AVATAR includes unit tests for 250 examples to facilitate functional correctness evaluation. We benchmark several pre-trained language models fine-tuned on AVATAR. Experiment results show that the models lack in generating functionally accurate code.
Comment: Accepted to Findings of ACL 2023
Document Type: Working Paper
Access URL: http://arxiv.org/abs/2108.11590
Accession Number: edsarx.2108.11590
Database: arXiv
FullText Text:
  Availability: 0
CustomLinks:
  – Url: http://arxiv.org/abs/2108.11590
    Name: EDS - Arxiv
    Category: fullText
    Text: View this record from Arxiv
    MouseOverText: View this record from Arxiv
  – Url: https://resolver.ebsco.com/c/xy5jbn/result?sid=EBSCO:edsarx&genre=article&issn=&ISBN=&volume=&issue=&date=20210826&spage=&pages=&title=AVATAR: A Parallel Corpus for Java-Python Program Translation&atitle=AVATAR%3A%20A%20Parallel%20Corpus%20for%20Java-Python%20Program%20Translation&aulast=Ahmad%2C%20Wasi%20Uddin&id=DOI:
    Name: Full Text Finder (for New FTF UI) (s8985755)
    Category: fullText
    Text: Find It @ SCU Libraries
    MouseOverText: Find It @ SCU Libraries
Header DbId: edsarx
DbLabel: arXiv
An: edsarx.2108.11590
RelevancyScore: 1017
AccessLevel: 3
PubType: Report
PubTypeId: report
PreciseRelevancyScore: 1016.91680908203
IllustrationInfo
Items – Name: Title
  Label: Title
  Group: Ti
  Data: AVATAR: A Parallel Corpus for Java-Python Program Translation
– Name: Author
  Label: Authors
  Group: Au
  Data: <searchLink fieldCode="AR" term="%22Ahmad%2C+Wasi+Uddin%22">Ahmad, Wasi Uddin</searchLink><br /><searchLink fieldCode="AR" term="%22Tushar%2C+Md+Golam+Rahman%22">Tushar, Md Golam Rahman</searchLink><br /><searchLink fieldCode="AR" term="%22Chakraborty%2C+Saikat%22">Chakraborty, Saikat</searchLink><br /><searchLink fieldCode="AR" term="%22Chang%2C+Kai-Wei%22">Chang, Kai-Wei</searchLink>
– Name: DatePubCY
  Label: Publication Year
  Group: Date
  Data: 2021
– Name: Subset
  Label: Collection
  Group: HoldingsInfo
  Data: Computer Science
– Name: Subject
  Label: Subject Terms
  Group: Su
  Data: <searchLink fieldCode="DE" term="%22Computer+Science+-+Software+Engineering%22">Computer Science - Software Engineering</searchLink><br /><searchLink fieldCode="DE" term="%22Computer+Science+-+Computation+and+Language%22">Computer Science - Computation and Language</searchLink>
– Name: Abstract
  Label: Description
  Group: Ab
  Data: Program translation refers to migrating source code from one programming language to another. It has tremendous practical value in software development, as porting software across languages is time-consuming and costly. Automating program translation is of paramount importance in software migration, and recently researchers explored unsupervised approaches due to the unavailability of parallel corpora. However, the availability of pre-trained language models for programming languages enables supervised fine-tuning with a small number of labeled examples. Therefore, we present AVATAR, a collection of 9,515 programming problems and their solutions written in two popular languages, Java and Python. AVATAR is collected from competitive programming sites, online platforms, and open-source repositories. Furthermore, AVATAR includes unit tests for 250 examples to facilitate functional correctness evaluation. We benchmark several pre-trained language models fine-tuned on AVATAR. Experiment results show that the models lack in generating functionally accurate code.<br />Comment: Accepted to Findings of ACL 2023
– Name: TypeDocument
  Label: Document Type
  Group: TypDoc
  Data: Working Paper
– Name: URL
  Label: Access URL
  Group: URL
  Data: <link linkTarget="URL" linkTerm="http://arxiv.org/abs/2108.11590" linkWindow="_blank">http://arxiv.org/abs/2108.11590</link>
– Name: AN
  Label: Accession Number
  Group: ID
  Data: edsarx.2108.11590
PLink https://login.libproxy.scu.edu/login?url=https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&scope=site&db=edsarx&AN=edsarx.2108.11590
RecordInfo BibRecord:
  BibEntity:
    Subjects:
      – SubjectFull: Computer Science - Software Engineering
        Type: general
      – SubjectFull: Computer Science - Computation and Language
        Type: general
    Titles:
      – TitleFull: AVATAR: A Parallel Corpus for Java-Python Program Translation
        Type: main
  BibRelationships:
    HasContributorRelationships:
      – PersonEntity:
          Name:
            NameFull: Ahmad, Wasi Uddin
      – PersonEntity:
          Name:
            NameFull: Tushar, Md Golam Rahman
      – PersonEntity:
          Name:
            NameFull: Chakraborty, Saikat
      – PersonEntity:
          Name:
            NameFull: Chang, Kai-Wei
    IsPartOfRelationships:
      – BibEntity:
          Dates:
            – D: 26
              M: 08
              Type: published
              Y: 2021
ResultId 1