Picture for Zhengyuan Yang

Zhengyuan Yang

OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation

Add code
Dec 12, 2024
Viaarxiv icon

ShowUI: One Vision-Language-Action Model for GUI Visual Agent

Add code
Nov 26, 2024
Figure 1 for ShowUI: One Vision-Language-Action Model for GUI Visual Agent
Figure 2 for ShowUI: One Vision-Language-Action Model for GUI Visual Agent
Figure 3 for ShowUI: One Vision-Language-Action Model for GUI Visual Agent
Figure 4 for ShowUI: One Vision-Language-Action Model for GUI Visual Agent
Viaarxiv icon

Conditional Text-to-Image Generation with Reference Guidance

Add code
Nov 22, 2024
Viaarxiv icon

GenXD: Generating Any 3D and 4D Scenes

Add code
Nov 05, 2024
Figure 1 for GenXD: Generating Any 3D and 4D Scenes
Figure 2 for GenXD: Generating Any 3D and 4D Scenes
Figure 3 for GenXD: Generating Any 3D and 4D Scenes
Figure 4 for GenXD: Generating Any 3D and 4D Scenes
Viaarxiv icon

LiVOS: Light Video Object Segmentation with Gated Linear Matching

Add code
Nov 05, 2024
Figure 1 for LiVOS: Light Video Object Segmentation with Gated Linear Matching
Figure 2 for LiVOS: Light Video Object Segmentation with Gated Linear Matching
Figure 3 for LiVOS: Light Video Object Segmentation with Gated Linear Matching
Figure 4 for LiVOS: Light Video Object Segmentation with Gated Linear Matching
Viaarxiv icon

SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation

Add code
Oct 30, 2024
Figure 1 for SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation
Figure 2 for SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation
Figure 3 for SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation
Figure 4 for SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation
Viaarxiv icon

MMCOMPOSITION: Revisiting the Compositionality of Pre-trained Vision-Language Models

Add code
Oct 13, 2024
Viaarxiv icon

Tuning Timestep-Distilled Diffusion Model Using Pairwise Sample Optimization

Add code
Oct 04, 2024
Figure 1 for Tuning Timestep-Distilled Diffusion Model Using Pairwise Sample Optimization
Figure 2 for Tuning Timestep-Distilled Diffusion Model Using Pairwise Sample Optimization
Figure 3 for Tuning Timestep-Distilled Diffusion Model Using Pairwise Sample Optimization
Figure 4 for Tuning Timestep-Distilled Diffusion Model Using Pairwise Sample Optimization
Viaarxiv icon

EditRoom: LLM-parameterized Graph Diffusion for Composable 3D Room Layout Editing

Add code
Oct 03, 2024
Figure 1 for EditRoom: LLM-parameterized Graph Diffusion for Composable 3D Room Layout Editing
Figure 2 for EditRoom: LLM-parameterized Graph Diffusion for Composable 3D Room Layout Editing
Figure 3 for EditRoom: LLM-parameterized Graph Diffusion for Composable 3D Room Layout Editing
Figure 4 for EditRoom: LLM-parameterized Graph Diffusion for Composable 3D Room Layout Editing
Viaarxiv icon

AutoDirector: Online Auto-scheduling Agents for Multi-sensory Composition

Add code
Aug 21, 2024
Figure 1 for AutoDirector: Online Auto-scheduling Agents for Multi-sensory Composition
Figure 2 for AutoDirector: Online Auto-scheduling Agents for Multi-sensory Composition
Figure 3 for AutoDirector: Online Auto-scheduling Agents for Multi-sensory Composition
Figure 4 for AutoDirector: Online Auto-scheduling Agents for Multi-sensory Composition
Viaarxiv icon