Picture for Yilun Zhou

Yilun Zhou

Massachusetts Institute of Technology

GTA: Generating Long-Horizon Tasks for Web Agents at Scale

Add code
May 28, 2026
Viaarxiv icon

The Hidden Signal of Verifier Strictness: Controlling and Improving Step-Wise Verification via Selective Latent Steering

Add code
May 20, 2026
Viaarxiv icon

VIBEPASS: Can Vibe Coders Really Pass the Vibe Check?

Add code
Mar 16, 2026
Viaarxiv icon

MMPersuade: A Dataset and Evaluation Framework for Multimodal Persuasion

Add code
Oct 26, 2025
Viaarxiv icon

All for One: LLMs Solve Mental Math at the Last Token With Information Transferred From Other Tokens

Add code
Sep 11, 2025
Viaarxiv icon

J4R: Learning to Judge with Equivalent Initial State Group Relative Preference Optimization

Add code
May 19, 2025
Viaarxiv icon

Evaluating Judges as Evaluators: The JETTS Benchmark of LLM-as-Judges as Test-Time Scaling Evaluators

Add code
Apr 21, 2025
Viaarxiv icon

BingoGuard: LLM Content Moderation Tools with Risk Levels

Add code
Mar 09, 2025
Viaarxiv icon

Direct Judgement Preference Optimization

Add code
Sep 23, 2024
Figure 1 for Direct Judgement Preference Optimization
Figure 2 for Direct Judgement Preference Optimization
Figure 3 for Direct Judgement Preference Optimization
Figure 4 for Direct Judgement Preference Optimization
Viaarxiv icon

Shared Imagination: LLMs Hallucinate Alike

Add code
Jul 23, 2024
Figure 1 for Shared Imagination: LLMs Hallucinate Alike
Figure 2 for Shared Imagination: LLMs Hallucinate Alike
Figure 3 for Shared Imagination: LLMs Hallucinate Alike
Figure 4 for Shared Imagination: LLMs Hallucinate Alike
Viaarxiv icon