Picture for Pichao Wang

Pichao Wang

Beyond Speaker Identity: Text Guided Target Speech Extraction

Add code
Jan 15, 2025
Viaarxiv icon

FlexDiT: Dynamic Token Density Control for Diffusion Transformer

Add code
Dec 08, 2024
Viaarxiv icon

Factorized Visual Tokenization and Generation

Add code
Nov 25, 2024
Viaarxiv icon

Enhancing Motion in Text-to-Video Generation with Decomposed Encoding and Conditioning

Add code
Oct 31, 2024
Viaarxiv icon

One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos

Add code
Sep 29, 2024
Figure 1 for One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos
Figure 2 for One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos
Figure 3 for One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos
Figure 4 for One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos
Viaarxiv icon

GQE: Generalized Query Expansion for Enhanced Text-Video Retrieval

Add code
Aug 14, 2024
Viaarxiv icon

Hallucination of Multimodal Large Language Models: A Survey

Add code
Apr 29, 2024
Figure 1 for Hallucination of Multimodal Large Language Models: A Survey
Figure 2 for Hallucination of Multimodal Large Language Models: A Survey
Figure 3 for Hallucination of Multimodal Large Language Models: A Survey
Figure 4 for Hallucination of Multimodal Large Language Models: A Survey
Viaarxiv icon

Text Is MASS: Modeling as Stochastic Embedding for Text-Video Retrieval

Add code
Mar 26, 2024
Viaarxiv icon

Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation

Add code
Nov 20, 2023
Viaarxiv icon

Human Pose-based Estimation, Tracking and Action Recognition with Deep Learning: A Survey

Add code
Oct 19, 2023
Viaarxiv icon