Picture for Jiashi Feng

Jiashi Feng

NUS

Image Understanding Makes for A Good Tokenizer for Image Generation

Add code
Nov 07, 2024
Viaarxiv icon

DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution

Add code
Nov 04, 2024
Figure 1 for DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution
Figure 2 for DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution
Figure 3 for DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution
Figure 4 for DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution
Viaarxiv icon

How Far is Video Generation from World Model: A Physical Law Perspective

Add code
Nov 04, 2024
Figure 1 for How Far is Video Generation from World Model: A Physical Law Perspective
Figure 2 for How Far is Video Generation from World Model: A Physical Law Perspective
Figure 3 for How Far is Video Generation from World Model: A Physical Law Perspective
Figure 4 for How Far is Video Generation from World Model: A Physical Law Perspective
Viaarxiv icon

LVD-2M: A Long-take Video Dataset with Temporally Dense Captions

Add code
Oct 14, 2024
Viaarxiv icon

Loong: Generating Minute-level Long Videos with Autoregressive Language Models

Add code
Oct 03, 2024
Viaarxiv icon

High Quality Human Image Animation using Regional Supervision and Motion Blur Condition

Add code
Sep 29, 2024
Figure 1 for High Quality Human Image Animation using Regional Supervision and Motion Blur Condition
Figure 2 for High Quality Human Image Animation using Regional Supervision and Motion Blur Condition
Figure 3 for High Quality Human Image Animation using Regional Supervision and Motion Blur Condition
Figure 4 for High Quality Human Image Animation using Regional Supervision and Motion Blur Condition
Viaarxiv icon

Hierarchical Memory for Long Video QA

Add code
Jun 30, 2024
Viaarxiv icon

Depth Anything V2

Add code
Jun 13, 2024
Figure 1 for Depth Anything V2
Figure 2 for Depth Anything V2
Figure 3 for Depth Anything V2
Figure 4 for Depth Anything V2
Viaarxiv icon

Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams

Add code
Jun 12, 2024
Viaarxiv icon

Empowering Visual Creativity: A Vision-Language Assistant to Image Editing Recommendations

Add code
May 31, 2024
Figure 1 for Empowering Visual Creativity: A Vision-Language Assistant to Image Editing Recommendations
Figure 2 for Empowering Visual Creativity: A Vision-Language Assistant to Image Editing Recommendations
Figure 3 for Empowering Visual Creativity: A Vision-Language Assistant to Image Editing Recommendations
Figure 4 for Empowering Visual Creativity: A Vision-Language Assistant to Image Editing Recommendations
Viaarxiv icon