Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Semiparametric Token-Sequence Co-Supervision

Mar 14, 2024

Hyunji Lee, Doyoung Kim, Jihoon Jun, Sejune Joo, Joel Jang, Kyoung-Woon On, Minjoon Seo

Figure 1 for Semiparametric Token-Sequence Co-Supervision

Figure 2 for Semiparametric Token-Sequence Co-Supervision

Figure 3 for Semiparametric Token-Sequence Co-Supervision

Figure 4 for Semiparametric Token-Sequence Co-Supervision

Share this with someone who'll enjoy it:

Abstract:In this work, we introduce a semiparametric token-sequence co-supervision training method. It trains a language model by simultaneously leveraging supervision from the traditional next token prediction loss which is calculated over the parametric token embedding space and the next sequence prediction loss which is calculated over the nonparametric sequence embedding space. The nonparametric sequence embedding space is constructed by a separate language model tasked to condense an input text into a single representative embedding. Our experiments demonstrate that a model trained via both supervisions consistently surpasses models trained via each supervision independently. Analysis suggests that this co-supervision encourages a broader generalization capability across the model. Especially, the robustness of parametric token space which is established during the pretraining step tends to effectively enhance the stability of nonparametric sequence embedding space, a new space established by another language model.

View paper on

Share this with someone who'll enjoy it:

Title:Semiparametric Token-Sequence Co-Supervision

Paper and Code