Picture for Pichao Wang

Pichao Wang

FlexDiT: Dynamic Token Density Control for Diffusion Transformer

Add code
Dec 08, 2024
Viaarxiv icon

Factorized Visual Tokenization and Generation

Add code
Nov 25, 2024
Viaarxiv icon

Enhancing Motion in Text-to-Video Generation with Decomposed Encoding and Conditioning

Add code
Oct 31, 2024
Viaarxiv icon

One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos

Add code
Sep 29, 2024
Figure 1 for One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos
Figure 2 for One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos
Figure 3 for One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos
Figure 4 for One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos
Viaarxiv icon

GQE: Generalized Query Expansion for Enhanced Text-Video Retrieval

Add code
Aug 14, 2024
Viaarxiv icon

Hallucination of Multimodal Large Language Models: A Survey

Add code
Apr 29, 2024
Viaarxiv icon

Text Is MASS: Modeling as Stochastic Embedding for Text-Video Retrieval

Add code
Mar 26, 2024
Viaarxiv icon

Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation

Add code
Nov 20, 2023
Viaarxiv icon

Human Pose-based Estimation, Tracking and Action Recognition with Deep Learning: A Survey

Add code
Oct 19, 2023
Viaarxiv icon

SCT: A Simple Baseline for Parameter-Efficient Fine-Tuning via Salient Channels

Add code
Sep 18, 2023
Viaarxiv icon