Picture for Wenqi Shao

Wenqi Shao

DexHandDiff: Interaction-aware Diffusion Planning for Adaptive Dexterous Manipulation

Add code
Dec 11, 2024
Viaarxiv icon

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

Add code
Dec 06, 2024
Viaarxiv icon

CLAP: Unsupervised 3D Representation Learning for Fusion 3D Perception via Curvature Sampling and Prototype Learning

Add code
Dec 04, 2024
Viaarxiv icon

TREND: Unsupervised 3D Representation Learning via Temporal Forecasting for LiDAR Perception

Add code
Dec 04, 2024
Viaarxiv icon

GATE OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation

Add code
Dec 01, 2024
Viaarxiv icon

DexDiffuser: Interaction-aware Diffusion Planning for Adaptive Dexterous Manipulation

Add code
Nov 27, 2024
Viaarxiv icon

$\textbf{EMOS}$: $\textbf{E}$mbodiment-aware Heterogeneous $\textbf{M}$ulti-robot $\textbf{O}$perating $\textbf{S}$ystem with LLM Agents

Add code
Oct 30, 2024
Viaarxiv icon

TP-Eval: Tap Multimodal LLMs' Potential in Evaluation by Customizing Prompts

Add code
Oct 23, 2024
Viaarxiv icon

ZipVL: Efficient Large Vision-Language Models with Dynamic Token Sparsification and KV Cache Compression

Add code
Oct 11, 2024
Viaarxiv icon

Dynamic Multimodal Evaluation with Flexible Complexity by Vision-Language Bootstrapping

Add code
Oct 11, 2024
Figure 1 for Dynamic Multimodal Evaluation with Flexible Complexity by Vision-Language Bootstrapping
Figure 2 for Dynamic Multimodal Evaluation with Flexible Complexity by Vision-Language Bootstrapping
Figure 3 for Dynamic Multimodal Evaluation with Flexible Complexity by Vision-Language Bootstrapping
Figure 4 for Dynamic Multimodal Evaluation with Flexible Complexity by Vision-Language Bootstrapping
Viaarxiv icon