Picture for Siyan Zhao

Siyan Zhao

Self-Distilled Reasoner: On-Policy Self-Distillation for Large Language Models

Add code
Jan 26, 2026
Viaarxiv icon

The performances of the Chinese and U.S. Large Language Models on the Topic of Chinese Culture

Add code
Jan 07, 2026
Viaarxiv icon

SPG: Sandwiched Policy Gradient for Masked Diffusion Language Models

Add code
Oct 10, 2025
Viaarxiv icon

d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning

Add code
Apr 16, 2025
Viaarxiv icon

Multi-fidelity Reinforcement Learning Control for Complex Dynamical Systems

Add code
Apr 08, 2025
Figure 1 for Multi-fidelity Reinforcement Learning Control for Complex Dynamical Systems
Figure 2 for Multi-fidelity Reinforcement Learning Control for Complex Dynamical Systems
Figure 3 for Multi-fidelity Reinforcement Learning Control for Complex Dynamical Systems
Figure 4 for Multi-fidelity Reinforcement Learning Control for Complex Dynamical Systems
Viaarxiv icon

Do LLMs Recognize Your Preferences? Evaluating Personalized Preference Following in LLMs

Add code
Feb 13, 2025
Figure 1 for Do LLMs Recognize Your Preferences? Evaluating Personalized Preference Following in LLMs
Figure 2 for Do LLMs Recognize Your Preferences? Evaluating Personalized Preference Following in LLMs
Figure 3 for Do LLMs Recognize Your Preferences? Evaluating Personalized Preference Following in LLMs
Figure 4 for Do LLMs Recognize Your Preferences? Evaluating Personalized Preference Following in LLMs
Viaarxiv icon

MedMax: Mixed-Modal Instruction Tuning for Training Biomedical Assistants

Add code
Dec 17, 2024
Viaarxiv icon

DODT: Enhanced Online Decision Transformer Learning through Dreamer's Actor-Critic Trajectory Forecasting

Add code
Oct 15, 2024
Figure 1 for DODT: Enhanced Online Decision Transformer Learning through Dreamer's Actor-Critic Trajectory Forecasting
Figure 2 for DODT: Enhanced Online Decision Transformer Learning through Dreamer's Actor-Critic Trajectory Forecasting
Figure 3 for DODT: Enhanced Online Decision Transformer Learning through Dreamer's Actor-Critic Trajectory Forecasting
Figure 4 for DODT: Enhanced Online Decision Transformer Learning through Dreamer's Actor-Critic Trajectory Forecasting
Viaarxiv icon

Probing the Decision Boundaries of In-context Learning in Large Language Models

Add code
Jun 17, 2024
Viaarxiv icon

Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models

Add code
Apr 15, 2024
Figure 1 for Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models
Figure 2 for Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models
Figure 3 for Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models
Figure 4 for Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models
Viaarxiv icon