Picture for Pichao Wang

Pichao Wang

Enhancing Motion in Text-to-Video Generation with Decomposed Encoding and Conditioning

Add code
Oct 31, 2024
Viaarxiv icon

One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos

Add code
Sep 29, 2024
Viaarxiv icon

GQE: Generalized Query Expansion for Enhanced Text-Video Retrieval

Add code
Aug 14, 2024
Viaarxiv icon

Hallucination of Multimodal Large Language Models: A Survey

Add code
Apr 29, 2024
Viaarxiv icon

Text Is MASS: Modeling as Stochastic Embedding for Text-Video Retrieval

Add code
Mar 26, 2024
Viaarxiv icon

Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation

Add code
Nov 20, 2023
Viaarxiv icon

Human Pose-based Estimation, Tracking and Action Recognition with Deep Learning: A Survey

Add code
Oct 19, 2023
Viaarxiv icon

SCT: A Simple Baseline for Parameter-Efficient Fine-Tuning via Salient Channels

Add code
Sep 18, 2023
Viaarxiv icon

Multi-stage Factorized Spatio-Temporal Representation for RGB-D Action and Gesture Recognition

Add code
Sep 11, 2023
Viaarxiv icon

Revisiting Vision Transformer from the View of Path Ensemble

Add code
Aug 12, 2023
Viaarxiv icon