Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:ODE Transformer: An Ordinary Differential Equation-Inspired Model for Sequence Generation

Mar 17, 2022

Bei Li, Quan Du, Tao Zhou, Yi Jing, Shuhan Zhou, Xin Zeng, Tong Xiao, JingBo Zhu, Xuebo Liu, Min Zhang

Figure 1 for ODE Transformer: An Ordinary Differential Equation-Inspired Model for Sequence Generation

Figure 2 for ODE Transformer: An Ordinary Differential Equation-Inspired Model for Sequence Generation

Figure 3 for ODE Transformer: An Ordinary Differential Equation-Inspired Model for Sequence Generation

Figure 4 for ODE Transformer: An Ordinary Differential Equation-Inspired Model for Sequence Generation

Share this with someone who'll enjoy it:

Abstract:Residual networks are an Euler discretization of solutions to Ordinary Differential Equations (ODE). This paper explores a deeper relationship between Transformer and numerical ODE methods. We first show that a residual block of layers in Transformer can be described as a higher-order solution to ODE. Inspired by this, we design a new architecture, {\it ODE Transformer}, which is analogous to the Runge-Kutta method that is well motivated in ODE. As a natural extension to Transformer, ODE Transformer is easy to implement and efficient to use. Experimental results on the large-scale machine translation, abstractive summarization, and grammar error correction tasks demonstrate the high genericity of ODE Transformer. It can gain large improvements in model performance over strong baselines (e.g., 30.77 and 44.11 BLEU scores on the WMT'14 English-German and English-French benchmarks) at a slight cost in inference efficiency.

* Long paper accepted by ACL2022 main conference. arXiv admin note: substantial text overlap with arXiv:2104.02308

View paper on

Share this with someone who'll enjoy it:

Title:ODE Transformer: An Ordinary Differential Equation-Inspired Model for Sequence Generation

Paper and Code