Picture for Nathan Kallus

Nathan Kallus

DiFFPO: Training Diffusion LLMs to Reason Fast and Furious via Reinforcement Learning

Add code
Oct 02, 2025
Viaarxiv icon

Entropy After $\langle \texttt{/Think} \rangle$ for reasoning model early exiting

Add code
Sep 30, 2025
Viaarxiv icon

Efficient Adaptive Experimentation with Non-Compliance

Add code
May 23, 2025
Viaarxiv icon

Value-Guided Search for Efficient Chain-of-Thought Reasoning

Add code
May 23, 2025
Viaarxiv icon

Nonparametric Instrumental Variable Inference with Many Weak Instruments

Add code
May 12, 2025
Viaarxiv icon

From Reviews to Dialogues: Active Synthesis for Zero-Shot LLM-based Conversational Recommender System

Add code
Apr 21, 2025
Viaarxiv icon

SNPL: Simultaneous Policy Learning and Evaluation for Safe Multi-Objective Policy Improvement

Add code
Mar 17, 2025
Viaarxiv icon

$Q\sharp$: Provably Optimal Distributional RL for LLM Post-Training

Add code
Feb 27, 2025
Figure 1 for $Q\sharp$: Provably Optimal Distributional RL for LLM Post-Training
Figure 2 for $Q\sharp$: Provably Optimal Distributional RL for LLM Post-Training
Figure 3 for $Q\sharp$: Provably Optimal Distributional RL for LLM Post-Training
Figure 4 for $Q\sharp$: Provably Optimal Distributional RL for LLM Post-Training
Viaarxiv icon

Collaborative Retrieval for Large Language Model-based Conversational Recommender Systems

Add code
Feb 19, 2025
Viaarxiv icon

GST-UNet: Spatiotemporal Causal Inference with Time-Varying Confounders

Add code
Feb 07, 2025
Viaarxiv icon