Picture for Jianwei Yang

Jianwei Yang

TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies

Add code
Dec 13, 2024
Viaarxiv icon

OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation

Add code
Dec 12, 2024
Viaarxiv icon

Mojito: Motion Trajectory and Intensity Control for Video Generation

Add code
Dec 12, 2024
Viaarxiv icon

Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion

Add code
Dec 05, 2024
Viaarxiv icon

Latent Action Pretraining from Videos

Add code
Oct 15, 2024
Figure 1 for Latent Action Pretraining from Videos
Figure 2 for Latent Action Pretraining from Videos
Figure 3 for Latent Action Pretraining from Videos
Figure 4 for Latent Action Pretraining from Videos
Viaarxiv icon

TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models

Add code
Oct 15, 2024
Figure 1 for TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models
Figure 2 for TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models
Figure 3 for TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models
Figure 4 for TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models
Viaarxiv icon

Towards Flexible Visual Relationship Segmentation

Add code
Aug 15, 2024
Viaarxiv icon

OmniParser for Pure Vision Based GUI Agent

Add code
Aug 01, 2024
Figure 1 for OmniParser for Pure Vision Based GUI Agent
Figure 2 for OmniParser for Pure Vision Based GUI Agent
Figure 3 for OmniParser for Pure Vision Based GUI Agent
Figure 4 for OmniParser for Pure Vision Based GUI Agent
Viaarxiv icon

V3Det Challenge 2024 on Vast Vocabulary and Open Vocabulary Object Detection: Methods and Results

Add code
Jun 17, 2024
Figure 1 for V3Det Challenge 2024 on Vast Vocabulary and Open Vocabulary Object Detection: Methods and Results
Figure 2 for V3Det Challenge 2024 on Vast Vocabulary and Open Vocabulary Object Detection: Methods and Results
Figure 3 for V3Det Challenge 2024 on Vast Vocabulary and Open Vocabulary Object Detection: Methods and Results
Viaarxiv icon

DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs

Add code
Jun 06, 2024
Viaarxiv icon