Picture for Zhengyuan Yang

Zhengyuan Yang

SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual Reasoning Self-Improvement

Add code
Apr 10, 2025
Viaarxiv icon

V-MAGE: A Game Evaluation Framework for Assessing Visual-Centric Capabilities in Multimodal Large Language Models

Add code
Apr 08, 2025
Viaarxiv icon

Measurement of LLM's Philosophies of Human Nature

Add code
Apr 03, 2025
Viaarxiv icon

Beyond Words: Advancing Long-Text Image Generation via Multimodal Autoregressive Models

Add code
Mar 26, 2025
Viaarxiv icon

Zero-Shot Audio-Visual Editing via Cross-Modal Delta Denoising

Add code
Mar 26, 2025
Viaarxiv icon

ImageGen-CoT: Enhancing Text-to-Image In-context Learning with Chain-of-Thought Reasoning

Add code
Mar 25, 2025
Viaarxiv icon

TextAtlas5M: A Large-scale Dataset for Dense Text Image Generation

Add code
Feb 11, 2025
Viaarxiv icon

ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding

Add code
Jan 09, 2025
Figure 1 for ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding
Figure 2 for ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding
Figure 3 for ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding
Figure 4 for ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding
Viaarxiv icon

Can MLLMs Reason in Multimodality? EMMA: An Enhanced MultiModal ReAsoning Benchmark

Add code
Jan 09, 2025
Viaarxiv icon

OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation

Add code
Dec 12, 2024
Viaarxiv icon