Picture for Pierre Ablin

Pierre Ablin

Ecole normale supérieure, Paris, France

Scaling Laws for Forgetting during Finetuning with Pretraining Data Injection

Add code
Feb 09, 2025
Viaarxiv icon

Soup-of-Experts: Pretraining Specialist Models via Parameters Averaging

Add code
Feb 03, 2025
Viaarxiv icon

A Unified Perspective on the Dynamics of Deep Transformers

Add code
Jan 30, 2025
Viaarxiv icon

MVICAD2: Multi-View Independent Component Analysis with Delays and Dilations

Add code
Jan 13, 2025
Figure 1 for MVICAD2: Multi-View Independent Component Analysis with Delays and Dilations
Figure 2 for MVICAD2: Multi-View Independent Component Analysis with Delays and Dilations
Figure 3 for MVICAD2: Multi-View Independent Component Analysis with Delays and Dilations
Figure 4 for MVICAD2: Multi-View Independent Component Analysis with Delays and Dilations
Viaarxiv icon

Sparse Repellency for Shielded Generation in Text-to-image Diffusion Models

Add code
Oct 10, 2024
Viaarxiv icon

Dynamic Gradient Alignment for Online Data Mixing

Add code
Oct 03, 2024
Viaarxiv icon

Theory, Analysis, and Best Practices for Sigmoid Self-Attention

Add code
Sep 06, 2024
Figure 1 for Theory, Analysis, and Best Practices for Sigmoid Self-Attention
Figure 2 for Theory, Analysis, and Best Practices for Sigmoid Self-Attention
Figure 3 for Theory, Analysis, and Best Practices for Sigmoid Self-Attention
Figure 4 for Theory, Analysis, and Best Practices for Sigmoid Self-Attention
Viaarxiv icon

The AdEMAMix Optimizer: Better, Faster, Older

Add code
Sep 05, 2024
Figure 1 for The AdEMAMix Optimizer: Better, Faster, Older
Figure 2 for The AdEMAMix Optimizer: Better, Faster, Older
Figure 3 for The AdEMAMix Optimizer: Better, Faster, Older
Figure 4 for The AdEMAMix Optimizer: Better, Faster, Older
Viaarxiv icon

Optimization without retraction on the random generalized Stiefel manifold

Add code
May 02, 2024
Figure 1 for Optimization without retraction on the random generalized Stiefel manifold
Figure 2 for Optimization without retraction on the random generalized Stiefel manifold
Figure 3 for Optimization without retraction on the random generalized Stiefel manifold
Figure 4 for Optimization without retraction on the random generalized Stiefel manifold
Viaarxiv icon

Enhancing Hypergradients Estimation: A Study of Preconditioning and Reparameterization

Add code
Feb 26, 2024
Viaarxiv icon