Picture for Aldo Pacchiano

Aldo Pacchiano

Scaling In-Context Online Learning Capability of LLMs via Cross-Episode Meta-RL

Add code
Feb 03, 2026
Viaarxiv icon

Principled Fine-tuning of LLMs from User-Edits: A Medley of Preference, Supervision, and Reward

Add code
Jan 27, 2026
Viaarxiv icon

Enhancing Diversity in Large Language Models via Determinantal Point Processes

Add code
Sep 05, 2025
Viaarxiv icon

On the Hardness of Bandit Learning

Add code
Jun 17, 2025
Viaarxiv icon

Meet Me at the Arm: The Cooperative Multi-Armed Bandits Problem with Shareable Arms

Add code
Jun 11, 2025
Figure 1 for Meet Me at the Arm: The Cooperative Multi-Armed Bandits Problem with Shareable Arms
Figure 2 for Meet Me at the Arm: The Cooperative Multi-Armed Bandits Problem with Shareable Arms
Figure 3 for Meet Me at the Arm: The Cooperative Multi-Armed Bandits Problem with Shareable Arms
Figure 4 for Meet Me at the Arm: The Cooperative Multi-Armed Bandits Problem with Shareable Arms
Viaarxiv icon

Pure Exploration with Feedback Graphs

Add code
Mar 10, 2025
Figure 1 for Pure Exploration with Feedback Graphs
Figure 2 for Pure Exploration with Feedback Graphs
Figure 3 for Pure Exploration with Feedback Graphs
Figure 4 for Pure Exploration with Feedback Graphs
Viaarxiv icon

Language Model Personalization via Reward Factorization

Add code
Mar 08, 2025
Viaarxiv icon

Adaptive Exploration for Multi-Reward Multi-Policy Evaluation

Add code
Feb 04, 2025
Figure 1 for Adaptive Exploration for Multi-Reward Multi-Policy Evaluation
Figure 2 for Adaptive Exploration for Multi-Reward Multi-Policy Evaluation
Figure 3 for Adaptive Exploration for Multi-Reward Multi-Policy Evaluation
Figure 4 for Adaptive Exploration for Multi-Reward Multi-Policy Evaluation
Viaarxiv icon

ORSO: Accelerating Reward Design via Online Reward Selection and Policy Optimization

Add code
Oct 17, 2024
Figure 1 for ORSO: Accelerating Reward Design via Online Reward Selection and Policy Optimization
Figure 2 for ORSO: Accelerating Reward Design via Online Reward Selection and Policy Optimization
Figure 3 for ORSO: Accelerating Reward Design via Online Reward Selection and Policy Optimization
Figure 4 for ORSO: Accelerating Reward Design via Online Reward Selection and Policy Optimization
Viaarxiv icon

State-free Reinforcement Learning

Add code
Sep 27, 2024
Figure 1 for State-free Reinforcement Learning
Viaarxiv icon