Picture for Mathieu Blondel

Mathieu Blondel

DMA, CNRS

Stepping on the Edge: Curvature Aware Learning Rate Tuners

Add code
Jul 08, 2024
Figure 1 for Stepping on the Edge: Curvature Aware Learning Rate Tuners
Figure 2 for Stepping on the Edge: Curvature Aware Learning Rate Tuners
Figure 3 for Stepping on the Edge: Curvature Aware Learning Rate Tuners
Figure 4 for Stepping on the Edge: Curvature Aware Learning Rate Tuners
Viaarxiv icon

Learning with Fitzpatrick Losses

Add code
May 23, 2024
Viaarxiv icon

The Elements of Differentiable Programming

Add code
Mar 21, 2024
Viaarxiv icon

How do Transformers perform In-Context Autoregressive Learning?

Add code
Feb 08, 2024
Figure 1 for How do Transformers perform In-Context Autoregressive Learning?
Figure 2 for How do Transformers perform In-Context Autoregressive Learning?
Figure 3 for How do Transformers perform In-Context Autoregressive Learning?
Figure 4 for How do Transformers perform In-Context Autoregressive Learning?
Viaarxiv icon

Implicit Diffusion: Efficient Optimization through Stochastic Sampling

Add code
Feb 08, 2024
Viaarxiv icon

Direct Language Model Alignment from Online AI Feedback

Add code
Feb 07, 2024
Viaarxiv icon

Decoding-time Realignment of Language Models

Add code
Feb 05, 2024
Viaarxiv icon

Routers in Vision Mixture of Experts: An Empirical Study

Add code
Jan 29, 2024
Viaarxiv icon

Dual Gauss-Newton Directions for Deep Learning

Add code
Aug 17, 2023
Figure 1 for Dual Gauss-Newton Directions for Deep Learning
Figure 2 for Dual Gauss-Newton Directions for Deep Learning
Figure 3 for Dual Gauss-Newton Directions for Deep Learning
Figure 4 for Dual Gauss-Newton Directions for Deep Learning
Viaarxiv icon

Fast, Differentiable and Sparse Top-k: a Convex Analysis Perspective

Add code
Feb 06, 2023
Figure 1 for Fast, Differentiable and Sparse Top-k: a Convex Analysis Perspective
Figure 2 for Fast, Differentiable and Sparse Top-k: a Convex Analysis Perspective
Figure 3 for Fast, Differentiable and Sparse Top-k: a Convex Analysis Perspective
Figure 4 for Fast, Differentiable and Sparse Top-k: a Convex Analysis Perspective
Viaarxiv icon