Picture for Alekh Agarwal

Alekh Agarwal

Optimizing Pre-Training Data Mixtures with Mixtures of Data Expert Models

Add code
Feb 21, 2025
Viaarxiv icon

Small steps no more: Global convergence of stochastic gradient bandits for arbitrary learning rates

Add code
Feb 11, 2025
Viaarxiv icon

Design Considerations in Offline Preference-based RL

Add code
Feb 08, 2025
Viaarxiv icon

Catoni Contextual Bandits are Robust to Heavy-tailed Rewards

Add code
Feb 04, 2025
Viaarxiv icon

Preserving Expert-Level Privacy in Offline Reinforcement Learning

Add code
Nov 18, 2024
Viaarxiv icon

Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning

Add code
Oct 10, 2024
Figure 1 for Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning
Figure 2 for Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning
Figure 3 for Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning
Figure 4 for Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning
Viaarxiv icon

Conditioned Language Policy: A General Framework for Steerable Multi-Objective Finetuning

Add code
Jul 22, 2024
Figure 1 for Conditioned Language Policy: A General Framework for Steerable Multi-Objective Finetuning
Figure 2 for Conditioned Language Policy: A General Framework for Steerable Multi-Objective Finetuning
Figure 3 for Conditioned Language Policy: A General Framework for Steerable Multi-Objective Finetuning
Figure 4 for Conditioned Language Policy: A General Framework for Steerable Multi-Objective Finetuning
Viaarxiv icon

Robust Preference Optimization through Reward Model Distillation

Add code
May 29, 2024
Viaarxiv icon

Offline Imitation Learning from Multiple Baselines with Applications to Compiler Optimization

Add code
Mar 28, 2024
Figure 1 for Offline Imitation Learning from Multiple Baselines with Applications to Compiler Optimization
Figure 2 for Offline Imitation Learning from Multiple Baselines with Applications to Compiler Optimization
Figure 3 for Offline Imitation Learning from Multiple Baselines with Applications to Compiler Optimization
Figure 4 for Offline Imitation Learning from Multiple Baselines with Applications to Compiler Optimization
Viaarxiv icon

Stochastic Gradient Succeeds for Bandits

Add code
Feb 27, 2024
Figure 1 for Stochastic Gradient Succeeds for Bandits
Figure 2 for Stochastic Gradient Succeeds for Bandits
Figure 3 for Stochastic Gradient Succeeds for Bandits
Figure 4 for Stochastic Gradient Succeeds for Bandits
Viaarxiv icon