ODE Transformer: An Ordinary Differential Equation-Inspired Model for Sequence Generation

Bibliographic Details
Title: ODE Transformer: An Ordinary Differential Equation-Inspired Model for Sequence Generation
Authors: Li, Bei, Du, Quan, Zhou, Tao, Jing, Yi, Zhou, Shuhan, Zeng, Xin, Xiao, Tong, Zhu, JingBo, Liu, Xuebo, Zhang, Min
Publication Year: 2022
Collection: Computer Science
Subject Terms: Computer Science - Computation and Language
More Details: Residual networks are an Euler discretization of solutions to Ordinary Differential Equations (ODE). This paper explores a deeper relationship between Transformer and numerical ODE methods. We first show that a residual block of layers in Transformer can be described as a higher-order solution to ODE. Inspired by this, we design a new architecture, {\it ODE Transformer}, which is analogous to the Runge-Kutta method that is well motivated in ODE. As a natural extension to Transformer, ODE Transformer is easy to implement and efficient to use. Experimental results on the large-scale machine translation, abstractive summarization, and grammar error correction tasks demonstrate the high genericity of ODE Transformer. It can gain large improvements in model performance over strong baselines (e.g., 30.77 and 44.11 BLEU scores on the WMT'14 English-German and English-French benchmarks) at a slight cost in inference efficiency.
Comment: Long paper accepted by ACL2022 main conference. arXiv admin note: substantial text overlap with arXiv:2104.02308
Document Type: Working Paper
Access URL: http://arxiv.org/abs/2203.09176
Accession Number: edsarx.2203.09176
Database: arXiv
FullText Text:
  Availability: 0
CustomLinks:
  – Url: http://arxiv.org/abs/2203.09176
    Name: EDS - Arxiv
    Category: fullText
    Text: View this record from Arxiv
    MouseOverText: View this record from Arxiv
  – Url: https://resolver.ebsco.com/c/xy5jbn/result?sid=EBSCO:edsarx&genre=article&issn=&ISBN=&volume=&issue=&date=20220317&spage=&pages=&title=ODE Transformer: An Ordinary Differential Equation-Inspired Model for Sequence Generation&atitle=ODE%20Transformer%3A%20An%20Ordinary%20Differential%20Equation-Inspired%20Model%20for%20Sequence%20Generation&aulast=Li%2C%20Bei&id=DOI:
    Name: Full Text Finder (for New FTF UI) (s8985755)
    Category: fullText
    Text: Find It @ SCU Libraries
    MouseOverText: Find It @ SCU Libraries
Header DbId: edsarx
DbLabel: arXiv
An: edsarx.2203.09176
RelevancyScore: 1028
AccessLevel: 3
PubType: Report
PubTypeId: report
PreciseRelevancyScore: 1027.62780761719
IllustrationInfo
Items – Name: Title
  Label: Title
  Group: Ti
  Data: ODE Transformer: An Ordinary Differential Equation-Inspired Model for Sequence Generation
– Name: Author
  Label: Authors
  Group: Au
  Data: <searchLink fieldCode="AR" term="%22Li%2C+Bei%22">Li, Bei</searchLink><br /><searchLink fieldCode="AR" term="%22Du%2C+Quan%22">Du, Quan</searchLink><br /><searchLink fieldCode="AR" term="%22Zhou%2C+Tao%22">Zhou, Tao</searchLink><br /><searchLink fieldCode="AR" term="%22Jing%2C+Yi%22">Jing, Yi</searchLink><br /><searchLink fieldCode="AR" term="%22Zhou%2C+Shuhan%22">Zhou, Shuhan</searchLink><br /><searchLink fieldCode="AR" term="%22Zeng%2C+Xin%22">Zeng, Xin</searchLink><br /><searchLink fieldCode="AR" term="%22Xiao%2C+Tong%22">Xiao, Tong</searchLink><br /><searchLink fieldCode="AR" term="%22Zhu%2C+JingBo%22">Zhu, JingBo</searchLink><br /><searchLink fieldCode="AR" term="%22Liu%2C+Xuebo%22">Liu, Xuebo</searchLink><br /><searchLink fieldCode="AR" term="%22Zhang%2C+Min%22">Zhang, Min</searchLink>
– Name: DatePubCY
  Label: Publication Year
  Group: Date
  Data: 2022
– Name: Subset
  Label: Collection
  Group: HoldingsInfo
  Data: Computer Science
– Name: Subject
  Label: Subject Terms
  Group: Su
  Data: <searchLink fieldCode="DE" term="%22Computer+Science+-+Computation+and+Language%22">Computer Science - Computation and Language</searchLink>
– Name: Abstract
  Label: Description
  Group: Ab
  Data: Residual networks are an Euler discretization of solutions to Ordinary Differential Equations (ODE). This paper explores a deeper relationship between Transformer and numerical ODE methods. We first show that a residual block of layers in Transformer can be described as a higher-order solution to ODE. Inspired by this, we design a new architecture, {\it ODE Transformer}, which is analogous to the Runge-Kutta method that is well motivated in ODE. As a natural extension to Transformer, ODE Transformer is easy to implement and efficient to use. Experimental results on the large-scale machine translation, abstractive summarization, and grammar error correction tasks demonstrate the high genericity of ODE Transformer. It can gain large improvements in model performance over strong baselines (e.g., 30.77 and 44.11 BLEU scores on the WMT'14 English-German and English-French benchmarks) at a slight cost in inference efficiency.<br />Comment: Long paper accepted by ACL2022 main conference. arXiv admin note: substantial text overlap with arXiv:2104.02308
– Name: TypeDocument
  Label: Document Type
  Group: TypDoc
  Data: Working Paper
– Name: URL
  Label: Access URL
  Group: URL
  Data: <link linkTarget="URL" linkTerm="http://arxiv.org/abs/2203.09176" linkWindow="_blank">http://arxiv.org/abs/2203.09176</link>
– Name: AN
  Label: Accession Number
  Group: ID
  Data: edsarx.2203.09176
PLink https://login.libproxy.scu.edu/login?url=https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&scope=site&db=edsarx&AN=edsarx.2203.09176
RecordInfo BibRecord:
  BibEntity:
    Subjects:
      – SubjectFull: Computer Science - Computation and Language
        Type: general
    Titles:
      – TitleFull: ODE Transformer: An Ordinary Differential Equation-Inspired Model for Sequence Generation
        Type: main
  BibRelationships:
    HasContributorRelationships:
      – PersonEntity:
          Name:
            NameFull: Li, Bei
      – PersonEntity:
          Name:
            NameFull: Du, Quan
      – PersonEntity:
          Name:
            NameFull: Zhou, Tao
      – PersonEntity:
          Name:
            NameFull: Jing, Yi
      – PersonEntity:
          Name:
            NameFull: Zhou, Shuhan
      – PersonEntity:
          Name:
            NameFull: Zeng, Xin
      – PersonEntity:
          Name:
            NameFull: Xiao, Tong
      – PersonEntity:
          Name:
            NameFull: Zhu, JingBo
      – PersonEntity:
          Name:
            NameFull: Liu, Xuebo
      – PersonEntity:
          Name:
            NameFull: Zhang, Min
    IsPartOfRelationships:
      – BibEntity:
          Dates:
            – D: 17
              M: 03
              Type: published
              Y: 2022
ResultId 1