Picture for Mark Schmidt

Mark Schmidt

SIERRA, LIENS

ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement Learning

Add code
Mar 12, 2025
Viaarxiv icon

Implicit Bias of SignGD and Adam on Multiclass Separable Data

Add code
Feb 07, 2025
Viaarxiv icon

Don't Be So Positive: Negative Step Sizes in Second-Order Methods

Add code
Nov 18, 2024
Figure 1 for Don't Be So Positive: Negative Step Sizes in Second-Order Methods
Figure 2 for Don't Be So Positive: Negative Step Sizes in Second-Order Methods
Figure 3 for Don't Be So Positive: Negative Step Sizes in Second-Order Methods
Figure 4 for Don't Be So Positive: Negative Step Sizes in Second-Order Methods
Viaarxiv icon

Why Line Search when you can Plane Search? SO-Friendly Neural Networks allow Per-Iteration Optimization of Learning and Momentum Rates for Every Layer

Add code
Jun 25, 2024
Viaarxiv icon

BlockLLM: Memory-Efficient Adaptation of LLMs by Selecting and Optimizing the Right Coordinate Blocks

Add code
Jun 25, 2024
Figure 1 for BlockLLM: Memory-Efficient Adaptation of LLMs by Selecting and Optimizing the Right Coordinate Blocks
Figure 2 for BlockLLM: Memory-Efficient Adaptation of LLMs by Selecting and Optimizing the Right Coordinate Blocks
Figure 3 for BlockLLM: Memory-Efficient Adaptation of LLMs by Selecting and Optimizing the Right Coordinate Blocks
Figure 4 for BlockLLM: Memory-Efficient Adaptation of LLMs by Selecting and Optimizing the Right Coordinate Blocks
Viaarxiv icon

Enhancing Policy Gradient with the Polyak Step-Size Adaption

Add code
Apr 11, 2024
Figure 1 for Enhancing Policy Gradient with the Polyak Step-Size Adaption
Figure 2 for Enhancing Policy Gradient with the Polyak Step-Size Adaption
Figure 3 for Enhancing Policy Gradient with the Polyak Step-Size Adaption
Figure 4 for Enhancing Policy Gradient with the Polyak Step-Size Adaption
Viaarxiv icon

Faster Convergence of Stochastic Accelerated Gradient Descent under Interpolation

Add code
Apr 03, 2024
Viaarxiv icon

Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent on Language Models

Add code
Feb 29, 2024
Viaarxiv icon

Analyzing and Improving Greedy 2-Coordinate Updates for Equality-Constrained Optimization via Steepest Descent in the 1-Norm

Add code
Jul 03, 2023
Figure 1 for Analyzing and Improving Greedy 2-Coordinate Updates for Equality-Constrained Optimization via Steepest Descent in the 1-Norm
Figure 2 for Analyzing and Improving Greedy 2-Coordinate Updates for Equality-Constrained Optimization via Steepest Descent in the 1-Norm
Figure 3 for Analyzing and Improving Greedy 2-Coordinate Updates for Equality-Constrained Optimization via Steepest Descent in the 1-Norm
Figure 4 for Analyzing and Improving Greedy 2-Coordinate Updates for Equality-Constrained Optimization via Steepest Descent in the 1-Norm
Viaarxiv icon

Don't be so Monotone: Relaxing Stochastic Line Search in Over-Parameterized Models

Add code
Jun 22, 2023
Viaarxiv icon