Picture for Jiashi Feng

Jiashi Feng

NUS

MagicArticulate: Make Your 3D Models Articulation-Ready

Add code
Feb 18, 2025
Viaarxiv icon

Video Depth Anything: Consistent Depth Estimation for Super-Long Videos

Add code
Jan 21, 2025
Viaarxiv icon

VideoWorld: Exploring Knowledge Learning from Unlabeled Videos

Add code
Jan 16, 2025
Viaarxiv icon

Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

Add code
Jan 07, 2025
Figure 1 for Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Figure 2 for Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Figure 3 for Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Figure 4 for Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Viaarxiv icon

Dora: Sampling and Benchmarking for 3D Shape Variational Auto-Encoders

Add code
Dec 24, 2024
Figure 1 for Dora: Sampling and Benchmarking for 3D Shape Variational Auto-Encoders
Figure 2 for Dora: Sampling and Benchmarking for 3D Shape Variational Auto-Encoders
Figure 3 for Dora: Sampling and Benchmarking for 3D Shape Variational Auto-Encoders
Figure 4 for Dora: Sampling and Benchmarking for 3D Shape Variational Auto-Encoders
Viaarxiv icon

Parallelized Autoregressive Visual Generation

Add code
Dec 19, 2024
Figure 1 for Parallelized Autoregressive Visual Generation
Figure 2 for Parallelized Autoregressive Visual Generation
Figure 3 for Parallelized Autoregressive Visual Generation
Figure 4 for Parallelized Autoregressive Visual Generation
Viaarxiv icon

Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation

Add code
Dec 18, 2024
Viaarxiv icon

Image Understanding Makes for A Good Tokenizer for Image Generation

Add code
Nov 07, 2024
Viaarxiv icon

How Far is Video Generation from World Model: A Physical Law Perspective

Add code
Nov 04, 2024
Figure 1 for How Far is Video Generation from World Model: A Physical Law Perspective
Figure 2 for How Far is Video Generation from World Model: A Physical Law Perspective
Figure 3 for How Far is Video Generation from World Model: A Physical Law Perspective
Figure 4 for How Far is Video Generation from World Model: A Physical Law Perspective
Viaarxiv icon

DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution

Add code
Nov 04, 2024
Figure 1 for DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution
Figure 2 for DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution
Figure 3 for DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution
Figure 4 for DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution
Viaarxiv icon