Picture for Yue Huang

Yue Huang

SpecAlign: Efficient Specification-Grounded Alignment of Large Language Models via Synthetic Data

Add code
Jun 17, 2026
Viaarxiv icon

UXBench: Measuring the Actionability of LLM-Generated UX Critiques

Add code
Jun 15, 2026
Viaarxiv icon

Getting Better at Working With You: Compiling User Corrections into Runtime Enforcement for Coding Agents

Add code
Jun 11, 2026
Viaarxiv icon

DOG-DPO:Dynamic Optimization in Geometry for Safety Alignment

Add code
Jun 04, 2026
Viaarxiv icon

AutoLab: Can Frontier Models Solve Long-Horizon Auto Research and Engineering Tasks?

Add code
Jun 03, 2026
Viaarxiv icon

Converted, Not Equivalent: Benchmarking Codebase Conversion via Observational Equivalence

Add code
May 27, 2026
Viaarxiv icon

JobBench: Aligning Agent Work With Human Will

Add code
May 25, 2026
Viaarxiv icon

AgentTrap: Measuring Runtime Trust Failures in Third-Party Agent Skills

Add code
May 13, 2026
Viaarxiv icon

Visual Aesthetic Benchmark: Can Frontier Models Judge Beauty?

Add code
May 12, 2026
Viaarxiv icon

Prompt-Activation Duality: Improving Activation Steering via Attention-Level Interventions

Add code
May 11, 2026
Viaarxiv icon