Picture for Yuntian Deng

Yuntian Deng

Long Grounded Thoughts: Distilling Compositional Visual Reasoning Chains at Scale

Add code
Nov 07, 2025
Viaarxiv icon

TokDrift: When LLM Speaks in Subwords but Code Speaks in Grammar

Add code
Oct 16, 2025
Viaarxiv icon

Interactive Training: Feedback-Driven Neural Network Optimization

Add code
Oct 02, 2025
Viaarxiv icon

From Chat Logs to Collective Insights: Aggregative Question Answering

Add code
May 29, 2025
Viaarxiv icon

Learn to Reason Efficiently with Adaptive Length-based Reward Shaping

Add code
May 21, 2025
Figure 1 for Learn to Reason Efficiently with Adaptive Length-based Reward Shaping
Figure 2 for Learn to Reason Efficiently with Adaptive Length-based Reward Shaping
Figure 3 for Learn to Reason Efficiently with Adaptive Length-based Reward Shaping
Figure 4 for Learn to Reason Efficiently with Adaptive Length-based Reward Shaping
Viaarxiv icon

The Leaderboard Illusion

Add code
Apr 29, 2025
Viaarxiv icon

WildVis: Open Source Visualizer for Million-Scale Chat Logs in the Wild

Add code
Sep 05, 2024
Viaarxiv icon

WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries

Add code
Jul 24, 2024
Figure 1 for WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries
Figure 2 for WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries
Figure 3 for WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries
Figure 4 for WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries
Viaarxiv icon

Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing

Add code
Jun 12, 2024
Viaarxiv icon

WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild

Add code
Jun 07, 2024
Figure 1 for WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild
Figure 2 for WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild
Figure 3 for WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild
Figure 4 for WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild
Viaarxiv icon