Picture for Kevin Lin

Kevin Lin

LiVOS: Light Video Object Segmentation with Gated Linear Matching

Add code
Nov 05, 2024
Figure 1 for LiVOS: Light Video Object Segmentation with Gated Linear Matching
Figure 2 for LiVOS: Light Video Object Segmentation with Gated Linear Matching
Figure 3 for LiVOS: Light Video Object Segmentation with Gated Linear Matching
Figure 4 for LiVOS: Light Video Object Segmentation with Gated Linear Matching
Viaarxiv icon

GenXD: Generating Any 3D and 4D Scenes

Add code
Nov 05, 2024
Figure 1 for GenXD: Generating Any 3D and 4D Scenes
Figure 2 for GenXD: Generating Any 3D and 4D Scenes
Figure 3 for GenXD: Generating Any 3D and 4D Scenes
Figure 4 for GenXD: Generating Any 3D and 4D Scenes
Viaarxiv icon

DexMimicGen: Automated Data Generation for Bimanual Dexterous Manipulation via Imitation Learning

Add code
Oct 31, 2024
Figure 1 for DexMimicGen: Automated Data Generation for Bimanual Dexterous Manipulation via Imitation Learning
Figure 2 for DexMimicGen: Automated Data Generation for Bimanual Dexterous Manipulation via Imitation Learning
Figure 3 for DexMimicGen: Automated Data Generation for Bimanual Dexterous Manipulation via Imitation Learning
Figure 4 for DexMimicGen: Automated Data Generation for Bimanual Dexterous Manipulation via Imitation Learning
Viaarxiv icon

SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation

Add code
Oct 30, 2024
Figure 1 for SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation
Figure 2 for SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation
Figure 3 for SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation
Figure 4 for SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation
Viaarxiv icon

Meta-DiffuB: A Contextualized Sequence-to-Sequence Text Diffusion Model with Meta-Exploration

Add code
Oct 17, 2024
Figure 1 for Meta-DiffuB: A Contextualized Sequence-to-Sequence Text Diffusion Model with Meta-Exploration
Figure 2 for Meta-DiffuB: A Contextualized Sequence-to-Sequence Text Diffusion Model with Meta-Exploration
Figure 3 for Meta-DiffuB: A Contextualized Sequence-to-Sequence Text Diffusion Model with Meta-Exploration
Figure 4 for Meta-DiffuB: A Contextualized Sequence-to-Sequence Text Diffusion Model with Meta-Exploration
Viaarxiv icon

Tuning Timestep-Distilled Diffusion Model Using Pairwise Sample Optimization

Add code
Oct 04, 2024
Figure 1 for Tuning Timestep-Distilled Diffusion Model Using Pairwise Sample Optimization
Figure 2 for Tuning Timestep-Distilled Diffusion Model Using Pairwise Sample Optimization
Figure 3 for Tuning Timestep-Distilled Diffusion Model Using Pairwise Sample Optimization
Figure 4 for Tuning Timestep-Distilled Diffusion Model Using Pairwise Sample Optimization
Viaarxiv icon

EditRoom: LLM-parameterized Graph Diffusion for Composable 3D Room Layout Editing

Add code
Oct 03, 2024
Figure 1 for EditRoom: LLM-parameterized Graph Diffusion for Composable 3D Room Layout Editing
Figure 2 for EditRoom: LLM-parameterized Graph Diffusion for Composable 3D Room Layout Editing
Figure 3 for EditRoom: LLM-parameterized Graph Diffusion for Composable 3D Room Layout Editing
Figure 4 for EditRoom: LLM-parameterized Graph Diffusion for Composable 3D Room Layout Editing
Viaarxiv icon

MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated Capabilities

Add code
Aug 01, 2024
Figure 1 for MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated Capabilities
Figure 2 for MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated Capabilities
Figure 3 for MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated Capabilities
Figure 4 for MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated Capabilities
Viaarxiv icon

IDOL: Unified Dual-Modal Latent Diffusion for Human-Centric Joint Video-Depth Generation

Add code
Jul 15, 2024
Figure 1 for IDOL: Unified Dual-Modal Latent Diffusion for Human-Centric Joint Video-Depth Generation
Figure 2 for IDOL: Unified Dual-Modal Latent Diffusion for Human-Centric Joint Video-Depth Generation
Figure 3 for IDOL: Unified Dual-Modal Latent Diffusion for Human-Centric Joint Video-Depth Generation
Figure 4 for IDOL: Unified Dual-Modal Latent Diffusion for Human-Centric Joint Video-Depth Generation
Viaarxiv icon

MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos

Add code
Jun 12, 2024
Viaarxiv icon