Picture for Max Vladymyrov

Max Vladymyrov

UC Merced

How new data permeates LLM knowledge and how to dilute it

Add code
Apr 13, 2025
Viaarxiv icon

Long Context In-Context Compression by Getting to the Gist of Gisting

Add code
Apr 11, 2025
Viaarxiv icon

Learning and Unlearning of Fabricated Knowledge in Language Models

Add code
Oct 29, 2024
Figure 1 for Learning and Unlearning of Fabricated Knowledge in Language Models
Figure 2 for Learning and Unlearning of Fabricated Knowledge in Language Models
Figure 3 for Learning and Unlearning of Fabricated Knowledge in Language Models
Figure 4 for Learning and Unlearning of Fabricated Knowledge in Language Models
Viaarxiv icon

Narrowing the Focus: Learned Optimizers for Pretrained Models

Add code
Aug 21, 2024
Figure 1 for Narrowing the Focus: Learned Optimizers for Pretrained Models
Figure 2 for Narrowing the Focus: Learned Optimizers for Pretrained Models
Figure 3 for Narrowing the Focus: Learned Optimizers for Pretrained Models
Figure 4 for Narrowing the Focus: Learned Optimizers for Pretrained Models
Viaarxiv icon

Linear Transformers are Versatile In-Context Learners

Add code
Feb 21, 2024
Viaarxiv icon

Uncovering mesa-optimization algorithms in Transformers

Add code
Sep 11, 2023
Figure 1 for Uncovering mesa-optimization algorithms in Transformers
Figure 2 for Uncovering mesa-optimization algorithms in Transformers
Figure 3 for Uncovering mesa-optimization algorithms in Transformers
Figure 4 for Uncovering mesa-optimization algorithms in Transformers
Viaarxiv icon

Continual Few-Shot Learning Using HyperTransformers

Add code
Jan 12, 2023
Viaarxiv icon

Training trajectories, mini-batch losses and the curious role of the learning rate

Add code
Jan 05, 2023
Figure 1 for Training trajectories, mini-batch losses and the curious role of the learning rate
Figure 2 for Training trajectories, mini-batch losses and the curious role of the learning rate
Figure 3 for Training trajectories, mini-batch losses and the curious role of the learning rate
Figure 4 for Training trajectories, mini-batch losses and the curious role of the learning rate
Viaarxiv icon

Transformers learn in-context by gradient descent

Add code
Dec 15, 2022
Viaarxiv icon

Decentralized Learning with Multi-Headed Distillation

Add code
Nov 28, 2022
Viaarxiv icon