Picture for Nathan Kallus

Nathan Kallus

SNPL: Simultaneous Policy Learning and Evaluation for Safe Multi-Objective Policy Improvement

Add code
Mar 17, 2025
Viaarxiv icon

$Q\sharp$: Provably Optimal Distributional RL for LLM Post-Training

Add code
Feb 27, 2025
Viaarxiv icon

Collaborative Retrieval for Large Language Model-based Conversational Recommender Systems

Add code
Feb 19, 2025
Viaarxiv icon

GST-UNet: Spatiotemporal Causal Inference with Time-Varying Confounders

Add code
Feb 07, 2025
Viaarxiv icon

Automatic Double Reinforcement Learning in Semiparametric Markov Decision Processes with Applications to Long-Term Causal Inference

Add code
Jan 12, 2025
Viaarxiv icon

Reward Maximization for Pure Exploration: Minimax Optimal Good Arm Identification for Nonparametric Multi-Armed Bandits

Add code
Oct 21, 2024
Figure 1 for Reward Maximization for Pure Exploration: Minimax Optimal Good Arm Identification for Nonparametric Multi-Armed Bandits
Figure 2 for Reward Maximization for Pure Exploration: Minimax Optimal Good Arm Identification for Nonparametric Multi-Armed Bandits
Figure 3 for Reward Maximization for Pure Exploration: Minimax Optimal Good Arm Identification for Nonparametric Multi-Armed Bandits
Figure 4 for Reward Maximization for Pure Exploration: Minimax Optimal Good Arm Identification for Nonparametric Multi-Armed Bandits
Viaarxiv icon

Adjusting Regression Models for Conditional Uncertainty Calibration

Add code
Sep 26, 2024
Viaarxiv icon

CSPI-MT: Calibrated Safe Policy Improvement with Multiple Testing for Threshold Policies

Add code
Aug 21, 2024
Figure 1 for CSPI-MT: Calibrated Safe Policy Improvement with Multiple Testing for Threshold Policies
Figure 2 for CSPI-MT: Calibrated Safe Policy Improvement with Multiple Testing for Threshold Policies
Figure 3 for CSPI-MT: Calibrated Safe Policy Improvement with Multiple Testing for Threshold Policies
Viaarxiv icon

Estimating Heterogeneous Treatment Effects by Combining Weak Instruments and Observational Data

Add code
Jun 10, 2024
Viaarxiv icon

Contextual Linear Optimization with Bandit Feedback

Add code
May 26, 2024
Viaarxiv icon