Picture for Wei Zhai

Wei Zhai

University of Science and Technology of China, China, JD Explore Academy, JD.com, China

Benchmarking Large Vision-Language Models via Directed Scene Graph for Comprehensive Image Captioning

Add code
Dec 12, 2024
Viaarxiv icon

Event-Based Tracking Any Point with Motion-Augmented Temporal Consistency

Add code
Dec 02, 2024
Viaarxiv icon

GREAT: Geometry-Intention Collaborative Inference for Open-Vocabulary 3D Object Affordance Grounding

Add code
Nov 29, 2024
Viaarxiv icon

Leverage Task Context for Object Affordance Ranking

Add code
Nov 25, 2024
Viaarxiv icon

Improved Video VAE for Latent Video Diffusion Model

Add code
Nov 10, 2024
Viaarxiv icon

EF-3DGS: Event-Aided Free-Trajectory 3D Gaussian Splatting

Add code
Oct 20, 2024
Viaarxiv icon

MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling

Add code
Oct 15, 2024
Viaarxiv icon

Visual-Geometric Collaborative Guidance for Affordance Learning

Add code
Oct 15, 2024
Viaarxiv icon

MentalGLM Series: Explainable Large Language Models for Mental Health Analysis on Chinese Social Media

Add code
Oct 14, 2024
Viaarxiv icon

VMAD: Visual-enhanced Multimodal Large Language Model for Zero-Shot Anomaly Detection

Add code
Sep 30, 2024
Viaarxiv icon