Picture for Yu-Xiong Wang

Yu-Xiong Wang

Aligning Generative Denoising with Discriminative Objectives Unleashes Diffusion for Visual Perception

Add code
Apr 15, 2025
Viaarxiv icon

AgMMU: A Comprehensive Agricultural Multimodal Understanding and Reasoning Benchmark

Add code
Apr 14, 2025
Viaarxiv icon

GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmentation

Add code
Apr 10, 2025
Viaarxiv icon

V2Edit: Versatile Video Diffusion Editor for Videos and 3D Scenes

Add code
Mar 13, 2025
Viaarxiv icon

InterMimic: Towards Universal Whole-Body Control for Physics-Based Human-Object Interactions

Add code
Feb 27, 2025
Viaarxiv icon

Proposer-Agent-Evaluator(PAE): Autonomous Skill Discovery For Foundation Model Internet Agents

Add code
Dec 17, 2024
Viaarxiv icon

Can We Generate Visual Programs Without Prompting LLMs?

Add code
Dec 11, 2024
Viaarxiv icon

PaintScene4D: Consistent 4D Scene Generation from Text Prompts

Add code
Dec 05, 2024
Viaarxiv icon

RandAR: Decoder-only Autoregressive Visual Generation in Random Orders

Add code
Dec 02, 2024
Figure 1 for RandAR: Decoder-only Autoregressive Visual Generation in Random Orders
Figure 2 for RandAR: Decoder-only Autoregressive Visual Generation in Random Orders
Figure 3 for RandAR: Decoder-only Autoregressive Visual Generation in Random Orders
Figure 4 for RandAR: Decoder-only Autoregressive Visual Generation in Random Orders
Viaarxiv icon

Diff-2-in-1: Bridging Generation and Dense Perception with Diffusion Models

Add code
Nov 07, 2024
Viaarxiv icon