Picture for Teng Xiao

Teng Xiao

Meta-Reinforcement Learning with Self-Reflection for Agentic Search

Add code
Mar 11, 2026
Viaarxiv icon

Small Reward Models via Backward Inference

Add code
Feb 14, 2026
Viaarxiv icon

Olmo 3

Add code
Dec 15, 2025
Viaarxiv icon

Simple Denoising Diffusion Language Models

Add code
Oct 27, 2025
Viaarxiv icon

Incentivizing Strong Reasoning from Weak Supervision

Add code
May 28, 2025
Figure 1 for Incentivizing Strong Reasoning from Weak Supervision
Figure 2 for Incentivizing Strong Reasoning from Weak Supervision
Figure 3 for Incentivizing Strong Reasoning from Weak Supervision
Figure 4 for Incentivizing Strong Reasoning from Weak Supervision
Viaarxiv icon

Inference-time Alignment in Continuous Space

Add code
May 26, 2025
Viaarxiv icon

Incentivizing Reasoning from Weak Supervision

Add code
May 26, 2025
Figure 1 for Incentivizing Reasoning from Weak Supervision
Figure 2 for Incentivizing Reasoning from Weak Supervision
Figure 3 for Incentivizing Reasoning from Weak Supervision
Figure 4 for Incentivizing Reasoning from Weak Supervision
Viaarxiv icon

InfoPO: On Mutual Information Maximization for Large Language Model Alignment

Add code
May 13, 2025
Viaarxiv icon

A Deep Single Image Rectification Approach for Pan-Tilt-Zoom Cameras

Add code
Apr 09, 2025
Viaarxiv icon

On a Connection Between Imitation Learning and RLHF

Add code
Mar 07, 2025
Viaarxiv icon