Picture for Paul Mineiro

Paul Mineiro

Flow-DPO: Improving LLM Mathematical Reasoning through Online Multi-Agent Learning

Add code
Oct 29, 2024
Viaarxiv icon

Active, anytime-valid risk controlling prediction sets

Add code
Jun 15, 2024
Viaarxiv icon

Online Joint Fine-tuning of Multi-Agent Flows

Add code
Jun 06, 2024
Viaarxiv icon

Provably Efficient Interactive-Grounded Learning with Personalized Reward

Add code
May 31, 2024
Viaarxiv icon

Aligning LLM Agents by Learning Latent Preference from User Edits

Add code
Apr 23, 2024
Figure 1 for Aligning LLM Agents by Learning Latent Preference from User Edits
Figure 2 for Aligning LLM Agents by Learning Latent Preference from User Edits
Figure 3 for Aligning LLM Agents by Learning Latent Preference from User Edits
Figure 4 for Aligning LLM Agents by Learning Latent Preference from User Edits
Viaarxiv icon

Efficient Contextual Bandits with Uninformed Feedback Graphs

Add code
Feb 12, 2024
Viaarxiv icon

Time-uniform confidence bands for the CDF under nonstationarity

Add code
Feb 28, 2023
Viaarxiv icon

Graph Feedback via Reduction to Regression

Add code
Feb 17, 2023
Viaarxiv icon

Infinite Action Contextual Bandits with Reusable Data Exhaust

Add code
Feb 16, 2023
Viaarxiv icon

Personalized Reward Learning with Interaction-Grounded Learning (IGL)

Add code
Nov 28, 2022
Viaarxiv icon