AI Coders Are Among Us: Rethinking Programming Language Grammar Towards Efficient Code Generation
Title: | AI Coders Are Among Us: Rethinking Programming Language Grammar Towards Efficient Code Generation |
---|---|
Authors: | Sun, Zhensu, Du, Xiaoning, Yang, Zhou, Li, Li, Lo, David |
Publication Year: | 2024 |
Collection: | Computer Science |
Subject Terms: | Computer Science - Software Engineering, Computer Science - Artificial Intelligence, Computer Science - Programming Languages |
More Details: | Artificial Intelligence (AI) models have emerged as another important audience for programming languages alongside humans and machines, as we enter the era of large language models (LLMs). LLMs can now perform well in coding competitions and even write programs like developers to solve various tasks, including mathematical problems. However, the grammar and layout of current programs are designed to cater the needs of human developers -- with many grammar tokens and formatting tokens being used to make the code easier for humans to read. While this is helpful, such a design adds unnecessary computational work for LLMs, as each token they either use or produce consumes computational resources. To improve inference efficiency and reduce computational costs, we propose the concept of AI-oriented grammar. This aims to represent code in a way that better suits the working mechanism of AI models. Code written with AI-oriented grammar discards formats and uses a minimum number of tokens to convey code semantics effectively. To demonstrate the feasibility of this concept, we explore and implement the first AI-oriented grammar for Python, named SimPy. SimPy is crafted by revising the original Python grammar through a series of heuristic rules. Programs written in SimPy maintain identical AST structures to those in standard Python. This allows for not only execution via a modified AST parser, but also seamless transformation between programs written in Python and SimPy, enabling human developers and LLMs to use Python and SimPy, respectively, when they need to collaborate. In the experiments, compared with Python, SimPy enables a reduction in token usage by 13.5% and 10.4% for CodeLlama and GPT-4, respectively, when completing the same set of code-related tasks. Additionally, these models can maintain or even improve their performance when using SimPy instead of Python for these tasks. Comment: Accepted by ISSTA'24 |
Document Type: | Working Paper |
Access URL: | http://arxiv.org/abs/2404.16333 |
Accession Number: | edsarx.2404.16333 |
Database: | arXiv |
FullText | Text: Availability: 0 CustomLinks: – Url: http://arxiv.org/abs/2404.16333 Name: EDS - Arxiv Category: fullText Text: View this record from Arxiv MouseOverText: View this record from Arxiv – Url: https://resolver.ebsco.com/c/xy5jbn/result?sid=EBSCO:edsarx&genre=article&issn=&ISBN=&volume=&issue=&date=20240425&spage=&pages=&title=AI Coders Are Among Us: Rethinking Programming Language Grammar Towards Efficient Code Generation&atitle=AI%20Coders%20Are%20Among%20Us%3A%20Rethinking%20Programming%20Language%20Grammar%20Towards%20Efficient%20Code%20Generation&aulast=Sun%2C%20Zhensu&id=DOI: Name: Full Text Finder (for New FTF UI) (s8985755) Category: fullText Text: Find It @ SCU Libraries MouseOverText: Find It @ SCU Libraries |
---|---|
Header | DbId: edsarx DbLabel: arXiv An: edsarx.2404.16333 RelevancyScore: 1098 AccessLevel: 3 PubType: Report PubTypeId: report PreciseRelevancyScore: 1098.01879882813 |
IllustrationInfo | |
Items | – Name: Title Label: Title Group: Ti Data: AI Coders Are Among Us: Rethinking Programming Language Grammar Towards Efficient Code Generation – Name: Author Label: Authors Group: Au Data: <searchLink fieldCode="AR" term="%22Sun%2C+Zhensu%22">Sun, Zhensu</searchLink><br /><searchLink fieldCode="AR" term="%22Du%2C+Xiaoning%22">Du, Xiaoning</searchLink><br /><searchLink fieldCode="AR" term="%22Yang%2C+Zhou%22">Yang, Zhou</searchLink><br /><searchLink fieldCode="AR" term="%22Li%2C+Li%22">Li, Li</searchLink><br /><searchLink fieldCode="AR" term="%22Lo%2C+David%22">Lo, David</searchLink> – Name: DatePubCY Label: Publication Year Group: Date Data: 2024 – Name: Subset Label: Collection Group: HoldingsInfo Data: Computer Science – Name: Subject Label: Subject Terms Group: Su Data: <searchLink fieldCode="DE" term="%22Computer+Science+-+Software+Engineering%22">Computer Science - Software Engineering</searchLink><br /><searchLink fieldCode="DE" term="%22Computer+Science+-+Artificial+Intelligence%22">Computer Science - Artificial Intelligence</searchLink><br /><searchLink fieldCode="DE" term="%22Computer+Science+-+Programming+Languages%22">Computer Science - Programming Languages</searchLink> – Name: Abstract Label: Description Group: Ab Data: Artificial Intelligence (AI) models have emerged as another important audience for programming languages alongside humans and machines, as we enter the era of large language models (LLMs). LLMs can now perform well in coding competitions and even write programs like developers to solve various tasks, including mathematical problems. However, the grammar and layout of current programs are designed to cater the needs of human developers -- with many grammar tokens and formatting tokens being used to make the code easier for humans to read. While this is helpful, such a design adds unnecessary computational work for LLMs, as each token they either use or produce consumes computational resources. To improve inference efficiency and reduce computational costs, we propose the concept of AI-oriented grammar. This aims to represent code in a way that better suits the working mechanism of AI models. Code written with AI-oriented grammar discards formats and uses a minimum number of tokens to convey code semantics effectively. To demonstrate the feasibility of this concept, we explore and implement the first AI-oriented grammar for Python, named SimPy. SimPy is crafted by revising the original Python grammar through a series of heuristic rules. Programs written in SimPy maintain identical AST structures to those in standard Python. This allows for not only execution via a modified AST parser, but also seamless transformation between programs written in Python and SimPy, enabling human developers and LLMs to use Python and SimPy, respectively, when they need to collaborate. In the experiments, compared with Python, SimPy enables a reduction in token usage by 13.5% and 10.4% for CodeLlama and GPT-4, respectively, when completing the same set of code-related tasks. Additionally, these models can maintain or even improve their performance when using SimPy instead of Python for these tasks.<br />Comment: Accepted by ISSTA'24 – Name: TypeDocument Label: Document Type Group: TypDoc Data: Working Paper – Name: URL Label: Access URL Group: URL Data: <link linkTarget="URL" linkTerm="http://arxiv.org/abs/2404.16333" linkWindow="_blank">http://arxiv.org/abs/2404.16333</link> – Name: AN Label: Accession Number Group: ID Data: edsarx.2404.16333 |
PLink | https://login.libproxy.scu.edu/login?url=https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&scope=site&db=edsarx&AN=edsarx.2404.16333 |
RecordInfo | BibRecord: BibEntity: Subjects: – SubjectFull: Computer Science - Software Engineering Type: general – SubjectFull: Computer Science - Artificial Intelligence Type: general – SubjectFull: Computer Science - Programming Languages Type: general Titles: – TitleFull: AI Coders Are Among Us: Rethinking Programming Language Grammar Towards Efficient Code Generation Type: main BibRelationships: HasContributorRelationships: – PersonEntity: Name: NameFull: Sun, Zhensu – PersonEntity: Name: NameFull: Du, Xiaoning – PersonEntity: Name: NameFull: Yang, Zhou – PersonEntity: Name: NameFull: Li, Li – PersonEntity: Name: NameFull: Lo, David IsPartOfRelationships: – BibEntity: Dates: – D: 25 M: 04 Type: published Y: 2024 |
ResultId | 1 |