Picture for Bingyi Kang

Bingyi Kang

Image Understanding Makes for A Good Tokenizer for Image Generation

Add code
Nov 07, 2024
Viaarxiv icon

DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution

Add code
Nov 04, 2024
Figure 1 for DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution
Figure 2 for DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution
Figure 3 for DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution
Figure 4 for DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution
Viaarxiv icon

How Far is Video Generation from World Model: A Physical Law Perspective

Add code
Nov 04, 2024
Figure 1 for How Far is Video Generation from World Model: A Physical Law Perspective
Figure 2 for How Far is Video Generation from World Model: A Physical Law Perspective
Figure 3 for How Far is Video Generation from World Model: A Physical Law Perspective
Figure 4 for How Far is Video Generation from World Model: A Physical Law Perspective
Viaarxiv icon

Loong: Generating Minute-level Long Videos with Autoregressive Language Models

Add code
Oct 03, 2024
Viaarxiv icon

Depth Anything V2

Add code
Jun 13, 2024
Figure 1 for Depth Anything V2
Figure 2 for Depth Anything V2
Figure 3 for Depth Anything V2
Figure 4 for Depth Anything V2
Viaarxiv icon

Improving Token-Based World Models with Parallel Observation Prediction

Add code
Feb 13, 2024
Viaarxiv icon

Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

Add code
Jan 19, 2024
Viaarxiv icon

Harnessing Diffusion Models for Visual Perception with Meta Prompts

Add code
Dec 22, 2023
Viaarxiv icon

FreeMask: Synthetic Images with Dense Annotations Make Stronger Segmentation Models

Add code
Oct 23, 2023
Viaarxiv icon

Understanding, Predicting and Better Resolving Q-Value Divergence in Offline-RL

Add code
Oct 06, 2023
Viaarxiv icon