Picture for Meisam Razaviyayn

Meisam Razaviyayn

Memory Caching: RNNs with Growing Memory

Add code
Feb 27, 2026
Viaarxiv icon

Less is More: Convergence Benefits of Fewer Data Weight Updates over Longer Horizon

Add code
Feb 23, 2026
Viaarxiv icon

Nested Learning: The Illusion of Deep Learning Architectures

Add code
Dec 31, 2025
Viaarxiv icon

TNT: Improving Chunkwise Training for Test-Time Memorization

Add code
Nov 10, 2025
Viaarxiv icon

Sampling and Loss Weights in Multi-Domain Training

Add code
Nov 10, 2025
Viaarxiv icon

Memory-Efficient Differentially Private Training with Gradient Random Projection

Add code
Jun 18, 2025
Viaarxiv icon

ATLAS: Learning to Optimally Memorize the Context at Test Time

Add code
May 29, 2025
Figure 1 for ATLAS: Learning to Optimally Memorize the Context at Test Time
Figure 2 for ATLAS: Learning to Optimally Memorize the Context at Test Time
Figure 3 for ATLAS: Learning to Optimally Memorize the Context at Test Time
Figure 4 for ATLAS: Learning to Optimally Memorize the Context at Test Time
Viaarxiv icon

It's All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization

Add code
Apr 17, 2025
Figure 1 for It's All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization
Figure 2 for It's All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization
Figure 3 for It's All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization
Figure 4 for It's All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization
Viaarxiv icon

Synthetic Text Generation for Training Large Language Models via Gradient Matching

Add code
Feb 24, 2025
Figure 1 for Synthetic Text Generation for Training Large Language Models via Gradient Matching
Figure 2 for Synthetic Text Generation for Training Large Language Models via Gradient Matching
Figure 3 for Synthetic Text Generation for Training Large Language Models via Gradient Matching
Figure 4 for Synthetic Text Generation for Training Large Language Models via Gradient Matching
Viaarxiv icon

PiKE: Adaptive Data Mixing for Multi-Task Learning Under Low Gradient Conflicts

Add code
Feb 10, 2025
Figure 1 for PiKE: Adaptive Data Mixing for Multi-Task Learning Under Low Gradient Conflicts
Figure 2 for PiKE: Adaptive Data Mixing for Multi-Task Learning Under Low Gradient Conflicts
Figure 3 for PiKE: Adaptive Data Mixing for Multi-Task Learning Under Low Gradient Conflicts
Figure 4 for PiKE: Adaptive Data Mixing for Multi-Task Learning Under Low Gradient Conflicts
Viaarxiv icon