Picture for Yujun Cai

Yujun Cai

Detecting and Mitigating Insertion Hallucination in Video-to-Audio Generation

Add code
Oct 09, 2025
Viaarxiv icon

ContextNav: Towards Agentic Multimodal In-Context Learning

Add code
Oct 06, 2025
Viaarxiv icon

Blockwise SFT for Diffusion Language Models: Reconciling Bidirectional Attention and Autoregressive Decoding

Add code
Aug 27, 2025
Viaarxiv icon

VistaWise: Building Cost-Effective Agent with Cross-Modal Knowledge Graph for Minecraft

Add code
Aug 26, 2025
Viaarxiv icon

MRFD: Multi-Region Fusion Decoding with Self-Consistency for Mitigating Hallucinations in LVLMs

Add code
Aug 14, 2025
Viaarxiv icon

$A^2R^2$: Advancing Img2LaTeX Conversion via Visual Reasoning with Attention-Guided Refinement

Add code
Jul 28, 2025
Viaarxiv icon

A Survey of Context Engineering for Large Language Models

Add code
Jul 17, 2025
Viaarxiv icon

Symbolic or Numerical? Understanding Physics Problem Solving in Reasoning LLMs

Add code
Jul 02, 2025
Viaarxiv icon

Understanding GUI Agent Localization Biases through Logit Sharpness

Add code
Jun 18, 2025
Viaarxiv icon

HAIF-GS: Hierarchical and Induced Flow-Guided Gaussian Splatting for Dynamic Scene

Add code
Jun 11, 2025
Viaarxiv icon