Picture for Yining Hong

Yining Hong

SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation

Add code
Oct 30, 2024
Figure 1 for SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation
Figure 2 for SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation
Figure 3 for SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation
Figure 4 for SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation
Viaarxiv icon

What Is Wrong with My Model? Identifying Systematic Problems with Semantic Data Slicing

Add code
Sep 14, 2024
Viaarxiv icon

FlexAttention for Efficient High-Resolution Vision-Language Models

Add code
Jul 29, 2024
Figure 1 for FlexAttention for Efficient High-Resolution Vision-Language Models
Figure 2 for FlexAttention for Efficient High-Resolution Vision-Language Models
Figure 3 for FlexAttention for Efficient High-Resolution Vision-Language Models
Figure 4 for FlexAttention for Efficient High-Resolution Vision-Language Models
Viaarxiv icon

3D-VLA: A 3D Vision-Language-Action Generative World Model

Add code
Mar 14, 2024
Figure 1 for 3D-VLA: A 3D Vision-Language-Action Generative World Model
Figure 2 for 3D-VLA: A 3D Vision-Language-Action Generative World Model
Figure 3 for 3D-VLA: A 3D Vision-Language-Action Generative World Model
Figure 4 for 3D-VLA: A 3D Vision-Language-Action Generative World Model
Viaarxiv icon

MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World

Add code
Jan 16, 2024
Figure 1 for MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World
Figure 2 for MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World
Figure 3 for MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World
Figure 4 for MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World
Viaarxiv icon

GENOME: GenerativE Neuro-symbOlic visual reasoning by growing and reusing ModulEs

Add code
Nov 08, 2023
Viaarxiv icon

CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding

Add code
Nov 06, 2023
Viaarxiv icon

3D-LLM: Injecting the 3D World into Large Language Models

Add code
Jul 24, 2023
Viaarxiv icon

3D Concept Learning and Reasoning from Multi-View Images

Add code
Mar 20, 2023
Viaarxiv icon

See, Think, Confirm: Interactive Prompting Between Vision and Language Models for Knowledge-based Visual Reasoning

Add code
Jan 12, 2023
Viaarxiv icon