Picture for Mengping Yang

Mengping Yang

Unraveling MMDiT Blocks: Training-free Analysis and Enhancement of Text-conditioned Diffusion

Add code
Jan 05, 2026
Viaarxiv icon

A unified multimodal understanding and generation model for cross-disciplinary scientific research

Add code
Jan 04, 2026
Viaarxiv icon

Uni-cot: Towards Unified Chain-of-Thought Reasoning Across Text and Vision

Add code
Aug 07, 2025
Figure 1 for Uni-cot: Towards Unified Chain-of-Thought Reasoning Across Text and Vision
Figure 2 for Uni-cot: Towards Unified Chain-of-Thought Reasoning Across Text and Vision
Figure 3 for Uni-cot: Towards Unified Chain-of-Thought Reasoning Across Text and Vision
Figure 4 for Uni-cot: Towards Unified Chain-of-Thought Reasoning Across Text and Vision
Viaarxiv icon

Omni-Video: Democratizing Unified Video Understanding and Generation

Add code
Jul 09, 2025
Viaarxiv icon

Cockatiel: Ensembling Synthetic and Human Preferenced Training for Detailed Video Caption

Add code
Mar 12, 2025
Viaarxiv icon

Adversarial Semantic Augmentation for Training Generative Adversarial Networks under Limited Data

Add code
Feb 02, 2025
Viaarxiv icon

E2EDiff: Direct Mapping from Noise to Data for Enhanced Diffusion Models

Add code
Dec 30, 2024
Figure 1 for E2EDiff: Direct Mapping from Noise to Data for Enhanced Diffusion Models
Figure 2 for E2EDiff: Direct Mapping from Noise to Data for Enhanced Diffusion Models
Figure 3 for E2EDiff: Direct Mapping from Noise to Data for Enhanced Diffusion Models
Figure 4 for E2EDiff: Direct Mapping from Noise to Data for Enhanced Diffusion Models
Viaarxiv icon

EVALALIGN: Supervised Fine-Tuning Multimodal LLMs with Human-Aligned Data for Evaluating Text-to-Image Models

Add code
Jun 27, 2024
Figure 1 for EVALALIGN: Supervised Fine-Tuning Multimodal LLMs with Human-Aligned Data for Evaluating Text-to-Image Models
Figure 2 for EVALALIGN: Supervised Fine-Tuning Multimodal LLMs with Human-Aligned Data for Evaluating Text-to-Image Models
Figure 3 for EVALALIGN: Supervised Fine-Tuning Multimodal LLMs with Human-Aligned Data for Evaluating Text-to-Image Models
Figure 4 for EVALALIGN: Supervised Fine-Tuning Multimodal LLMs with Human-Aligned Data for Evaluating Text-to-Image Models
Viaarxiv icon

EvalAlign: Evaluating Text-to-Image Models through Precision Alignment of Multimodal Large Models with Supervised Fine-Tuning to Human Annotations

Add code
Jun 24, 2024
Figure 1 for EvalAlign: Evaluating Text-to-Image Models through Precision Alignment of Multimodal Large Models with Supervised Fine-Tuning to Human Annotations
Figure 2 for EvalAlign: Evaluating Text-to-Image Models through Precision Alignment of Multimodal Large Models with Supervised Fine-Tuning to Human Annotations
Figure 3 for EvalAlign: Evaluating Text-to-Image Models through Precision Alignment of Multimodal Large Models with Supervised Fine-Tuning to Human Annotations
Figure 4 for EvalAlign: Evaluating Text-to-Image Models through Precision Alignment of Multimodal Large Models with Supervised Fine-Tuning to Human Annotations
Viaarxiv icon

Attention Calibration for Disentangled Text-to-Image Personalization

Add code
Apr 11, 2024
Viaarxiv icon