ODE Transformer: An Ordinary Differential Equation-Inspired Model for Sequence Generation
Title: | ODE Transformer: An Ordinary Differential Equation-Inspired Model for Sequence Generation |
---|---|
Authors: | Li, Bei, Du, Quan, Zhou, Tao, Jing, Yi, Zhou, Shuhan, Zeng, Xin, Xiao, Tong, Zhu, JingBo, Liu, Xuebo, Zhang, Min |
Publication Year: | 2022 |
Collection: | Computer Science |
Subject Terms: | Computer Science - Computation and Language |
More Details: | Residual networks are an Euler discretization of solutions to Ordinary Differential Equations (ODE). This paper explores a deeper relationship between Transformer and numerical ODE methods. We first show that a residual block of layers in Transformer can be described as a higher-order solution to ODE. Inspired by this, we design a new architecture, {\it ODE Transformer}, which is analogous to the Runge-Kutta method that is well motivated in ODE. As a natural extension to Transformer, ODE Transformer is easy to implement and efficient to use. Experimental results on the large-scale machine translation, abstractive summarization, and grammar error correction tasks demonstrate the high genericity of ODE Transformer. It can gain large improvements in model performance over strong baselines (e.g., 30.77 and 44.11 BLEU scores on the WMT'14 English-German and English-French benchmarks) at a slight cost in inference efficiency. Comment: Long paper accepted by ACL2022 main conference. arXiv admin note: substantial text overlap with arXiv:2104.02308 |
Document Type: | Working Paper |
Access URL: | http://arxiv.org/abs/2203.09176 |
Accession Number: | edsarx.2203.09176 |
Database: | arXiv |
FullText | Text: Availability: 0 CustomLinks: – Url: http://arxiv.org/abs/2203.09176 Name: EDS - Arxiv Category: fullText Text: View this record from Arxiv MouseOverText: View this record from Arxiv – Url: https://resolver.ebsco.com/c/xy5jbn/result?sid=EBSCO:edsarx&genre=article&issn=&ISBN=&volume=&issue=&date=20220317&spage=&pages=&title=ODE Transformer: An Ordinary Differential Equation-Inspired Model for Sequence Generation&atitle=ODE%20Transformer%3A%20An%20Ordinary%20Differential%20Equation-Inspired%20Model%20for%20Sequence%20Generation&aulast=Li%2C%20Bei&id=DOI: Name: Full Text Finder (for New FTF UI) (s8985755) Category: fullText Text: Find It @ SCU Libraries MouseOverText: Find It @ SCU Libraries |
---|---|
Header | DbId: edsarx DbLabel: arXiv An: edsarx.2203.09176 RelevancyScore: 1028 AccessLevel: 3 PubType: Report PubTypeId: report PreciseRelevancyScore: 1027.62780761719 |
IllustrationInfo | |
Items | – Name: Title Label: Title Group: Ti Data: ODE Transformer: An Ordinary Differential Equation-Inspired Model for Sequence Generation – Name: Author Label: Authors Group: Au Data: <searchLink fieldCode="AR" term="%22Li%2C+Bei%22">Li, Bei</searchLink><br /><searchLink fieldCode="AR" term="%22Du%2C+Quan%22">Du, Quan</searchLink><br /><searchLink fieldCode="AR" term="%22Zhou%2C+Tao%22">Zhou, Tao</searchLink><br /><searchLink fieldCode="AR" term="%22Jing%2C+Yi%22">Jing, Yi</searchLink><br /><searchLink fieldCode="AR" term="%22Zhou%2C+Shuhan%22">Zhou, Shuhan</searchLink><br /><searchLink fieldCode="AR" term="%22Zeng%2C+Xin%22">Zeng, Xin</searchLink><br /><searchLink fieldCode="AR" term="%22Xiao%2C+Tong%22">Xiao, Tong</searchLink><br /><searchLink fieldCode="AR" term="%22Zhu%2C+JingBo%22">Zhu, JingBo</searchLink><br /><searchLink fieldCode="AR" term="%22Liu%2C+Xuebo%22">Liu, Xuebo</searchLink><br /><searchLink fieldCode="AR" term="%22Zhang%2C+Min%22">Zhang, Min</searchLink> – Name: DatePubCY Label: Publication Year Group: Date Data: 2022 – Name: Subset Label: Collection Group: HoldingsInfo Data: Computer Science – Name: Subject Label: Subject Terms Group: Su Data: <searchLink fieldCode="DE" term="%22Computer+Science+-+Computation+and+Language%22">Computer Science - Computation and Language</searchLink> – Name: Abstract Label: Description Group: Ab Data: Residual networks are an Euler discretization of solutions to Ordinary Differential Equations (ODE). This paper explores a deeper relationship between Transformer and numerical ODE methods. We first show that a residual block of layers in Transformer can be described as a higher-order solution to ODE. Inspired by this, we design a new architecture, {\it ODE Transformer}, which is analogous to the Runge-Kutta method that is well motivated in ODE. As a natural extension to Transformer, ODE Transformer is easy to implement and efficient to use. Experimental results on the large-scale machine translation, abstractive summarization, and grammar error correction tasks demonstrate the high genericity of ODE Transformer. It can gain large improvements in model performance over strong baselines (e.g., 30.77 and 44.11 BLEU scores on the WMT'14 English-German and English-French benchmarks) at a slight cost in inference efficiency.<br />Comment: Long paper accepted by ACL2022 main conference. arXiv admin note: substantial text overlap with arXiv:2104.02308 – Name: TypeDocument Label: Document Type Group: TypDoc Data: Working Paper – Name: URL Label: Access URL Group: URL Data: <link linkTarget="URL" linkTerm="http://arxiv.org/abs/2203.09176" linkWindow="_blank">http://arxiv.org/abs/2203.09176</link> – Name: AN Label: Accession Number Group: ID Data: edsarx.2203.09176 |
PLink | https://login.libproxy.scu.edu/login?url=https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&scope=site&db=edsarx&AN=edsarx.2203.09176 |
RecordInfo | BibRecord: BibEntity: Subjects: – SubjectFull: Computer Science - Computation and Language Type: general Titles: – TitleFull: ODE Transformer: An Ordinary Differential Equation-Inspired Model for Sequence Generation Type: main BibRelationships: HasContributorRelationships: – PersonEntity: Name: NameFull: Li, Bei – PersonEntity: Name: NameFull: Du, Quan – PersonEntity: Name: NameFull: Zhou, Tao – PersonEntity: Name: NameFull: Jing, Yi – PersonEntity: Name: NameFull: Zhou, Shuhan – PersonEntity: Name: NameFull: Zeng, Xin – PersonEntity: Name: NameFull: Xiao, Tong – PersonEntity: Name: NameFull: Zhu, JingBo – PersonEntity: Name: NameFull: Liu, Xuebo – PersonEntity: Name: NameFull: Zhang, Min IsPartOfRelationships: – BibEntity: Dates: – D: 17 M: 03 Type: published Y: 2022 |
ResultId | 1 |