Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:KERPLE: Kernelized Relative Positional Embedding for Length Extrapolation

May 20, 2022

Ta-Chung Chi, Ting-Han Fan, Peter J. Ramadge, Alexander I. Rudnicky

Figure 1 for KERPLE: Kernelized Relative Positional Embedding for Length Extrapolation

Figure 2 for KERPLE: Kernelized Relative Positional Embedding for Length Extrapolation

Figure 3 for KERPLE: Kernelized Relative Positional Embedding for Length Extrapolation

Figure 4 for KERPLE: Kernelized Relative Positional Embedding for Length Extrapolation

Share this with someone who'll enjoy it:

Abstract:Relative positional embeddings (RPE) have received considerable attention since RPEs effectively model the relative distance among tokens and enable length extrapolation. We propose KERPLE, a framework that generalizes relative position embedding for extrapolation by kernelizing positional differences. We achieve this goal using conditionally positive definite (CPD) kernels, a class of functions known for generalizing distance metrics. To maintain the inner product interpretation of self-attention, we show that a CPD kernel can be transformed into a PD kernel by adding a constant offset. This offset is implicitly absorbed in the Softmax normalization during self-attention. The diversity of CPD kernels allows us to derive various RPEs that enable length extrapolation in a principled way. Experiments demonstrate that the logarithmic variant achieves excellent extrapolation performance on three large language modeling datasets.

* The first two authors contributed equally to this work

View paper on

OpenReview

Share this with someone who'll enjoy it:

Title:KERPLE: Kernelized Relative Positional Embedding for Length Extrapolation

Paper and Code