Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Understanding Token Probability Encoding in Output Embeddings

Jun 03, 2024

Hakaze Cho, Yoshihiro Sakai, Kenshiro Tanaka, Mariko Kato, Naoya Inoue

Figure 1 for Understanding Token Probability Encoding in Output Embeddings

Figure 2 for Understanding Token Probability Encoding in Output Embeddings

Figure 3 for Understanding Token Probability Encoding in Output Embeddings

Figure 4 for Understanding Token Probability Encoding in Output Embeddings

Share this with someone who'll enjoy it:

Abstract:In this paper, we investigate the output token probability information in the output embedding of language models. We provide an approximate common log-linear encoding of output token probabilities within the output embedding vectors and demonstrate that it is accurate and sparse when the output space is large and output logits are concentrated. Based on such findings, we edit the encoding in output embedding to modify the output probability distribution accurately. Moreover, the sparsity we find in output probability encoding suggests that a large number of dimensions in the output embedding do not contribute to causal language modeling. Therefore, we attempt to delete the output-unrelated dimensions and find more than 30% of the dimensions can be deleted without significant movement in output distribution and degeneration on sequence generation. Additionally, in training dynamics, we use such encoding as a probe and find that the output embeddings capture token frequency information in early steps, even before an obvious convergence starts.

* 15 pages, 17 figures, 3 tables

View paper on

Share this with someone who'll enjoy it:

Title:Understanding Token Probability Encoding in Output Embeddings

Paper and Code