Picture for Nathan Kallus

Nathan Kallus

Exploration in the Limit

Add code
Dec 31, 2025
Viaarxiv icon

Stationary Reweighting Yields Local Convergence of Soft Fitted Q-Iteration

Add code
Dec 30, 2025
Viaarxiv icon

Efficient Inference for Inverse Reinforcement Learning and Dynamic Discrete Choice Models

Add code
Dec 30, 2025
Viaarxiv icon

Bellman Calibration for V-Learning in Offline Reinforcement Learning

Add code
Dec 29, 2025
Viaarxiv icon

Fitted Q Evaluation Without Bellman Completeness via Stationary Weighting

Add code
Dec 29, 2025
Viaarxiv icon

Semiparametric Preference Optimization: Your Language Model is Secretly a Single-Index Model

Add code
Dec 26, 2025
Viaarxiv icon

The Value of Personalized Recommendations: Evidence from Netflix

Add code
Nov 11, 2025
Viaarxiv icon

DiFFPO: Training Diffusion LLMs to Reason Fast and Furious via Reinforcement Learning

Add code
Oct 02, 2025
Figure 1 for DiFFPO: Training Diffusion LLMs to Reason Fast and Furious via Reinforcement Learning
Figure 2 for DiFFPO: Training Diffusion LLMs to Reason Fast and Furious via Reinforcement Learning
Figure 3 for DiFFPO: Training Diffusion LLMs to Reason Fast and Furious via Reinforcement Learning
Figure 4 for DiFFPO: Training Diffusion LLMs to Reason Fast and Furious via Reinforcement Learning
Viaarxiv icon

Entropy After $\langle \texttt{/Think} \rangle$ for reasoning model early exiting

Add code
Sep 30, 2025
Figure 1 for Entropy After $\langle \texttt{/Think} \rangle$ for reasoning model early exiting
Figure 2 for Entropy After $\langle \texttt{/Think} \rangle$ for reasoning model early exiting
Figure 3 for Entropy After $\langle \texttt{/Think} \rangle$ for reasoning model early exiting
Figure 4 for Entropy After $\langle \texttt{/Think} \rangle$ for reasoning model early exiting
Viaarxiv icon

Value-Guided Search for Efficient Chain-of-Thought Reasoning

Add code
May 23, 2025
Figure 1 for Value-Guided Search for Efficient Chain-of-Thought Reasoning
Figure 2 for Value-Guided Search for Efficient Chain-of-Thought Reasoning
Figure 3 for Value-Guided Search for Efficient Chain-of-Thought Reasoning
Figure 4 for Value-Guided Search for Efficient Chain-of-Thought Reasoning
Viaarxiv icon