Picture for Pichao Wang

Pichao Wang

CRPO: Confidence-Reward Driven Preference Optimization for Machine Translation

Add code
Jan 23, 2025
Figure 1 for CRPO: Confidence-Reward Driven Preference Optimization for Machine Translation
Figure 2 for CRPO: Confidence-Reward Driven Preference Optimization for Machine Translation
Figure 3 for CRPO: Confidence-Reward Driven Preference Optimization for Machine Translation
Figure 4 for CRPO: Confidence-Reward Driven Preference Optimization for Machine Translation
Viaarxiv icon

Beyond Speaker Identity: Text Guided Target Speech Extraction

Add code
Jan 15, 2025
Viaarxiv icon

FlexDiT: Dynamic Token Density Control for Diffusion Transformer

Add code
Dec 08, 2024
Viaarxiv icon

Factorized Visual Tokenization and Generation

Add code
Nov 25, 2024
Viaarxiv icon

Enhancing Motion in Text-to-Video Generation with Decomposed Encoding and Conditioning

Add code
Oct 31, 2024
Figure 1 for Enhancing Motion in Text-to-Video Generation with Decomposed Encoding and Conditioning
Figure 2 for Enhancing Motion in Text-to-Video Generation with Decomposed Encoding and Conditioning
Figure 3 for Enhancing Motion in Text-to-Video Generation with Decomposed Encoding and Conditioning
Figure 4 for Enhancing Motion in Text-to-Video Generation with Decomposed Encoding and Conditioning
Viaarxiv icon

One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos

Add code
Sep 29, 2024
Figure 1 for One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos
Figure 2 for One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos
Figure 3 for One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos
Figure 4 for One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos
Viaarxiv icon

GQE: Generalized Query Expansion for Enhanced Text-Video Retrieval

Add code
Aug 14, 2024
Viaarxiv icon

Hallucination of Multimodal Large Language Models: A Survey

Add code
Apr 29, 2024
Figure 1 for Hallucination of Multimodal Large Language Models: A Survey
Figure 2 for Hallucination of Multimodal Large Language Models: A Survey
Figure 3 for Hallucination of Multimodal Large Language Models: A Survey
Figure 4 for Hallucination of Multimodal Large Language Models: A Survey
Viaarxiv icon

Text Is MASS: Modeling as Stochastic Embedding for Text-Video Retrieval

Add code
Mar 26, 2024
Viaarxiv icon

Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation

Add code
Nov 20, 2023
Figure 1 for Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation
Figure 2 for Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation
Figure 3 for Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation
Figure 4 for Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation
Viaarxiv icon