Picture for Xiangtai Li

Xiangtai Li

DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation

Add code
Dec 10, 2024
Viaarxiv icon

EMOv2: Pushing 5M Vision Model Frontier

Add code
Dec 09, 2024
Viaarxiv icon

HumanEdit: A High-Quality Human-Rewarded Dataset for Instruction-based Image Editing

Add code
Dec 05, 2024
Viaarxiv icon

SIDA: Social Media Image Deepfake Detection, Localization and Explanation with Large Multimodal Model

Add code
Dec 05, 2024
Viaarxiv icon

DynamicControl: Adaptive Condition Selection for Improved Text-to-Image Generation

Add code
Dec 04, 2024
Figure 1 for DynamicControl: Adaptive Condition Selection for Improved Text-to-Image Generation
Figure 2 for DynamicControl: Adaptive Condition Selection for Improved Text-to-Image Generation
Figure 3 for DynamicControl: Adaptive Condition Selection for Improved Text-to-Image Generation
Figure 4 for DynamicControl: Adaptive Condition Selection for Improved Text-to-Image Generation
Viaarxiv icon

RelationBooth: Towards Relation-Aware Customized Object Generation

Add code
Oct 30, 2024
Viaarxiv icon

Synergistic Dual Spatial-aware Generation of Image-to-Text and Text-to-Image

Add code
Oct 20, 2024
Viaarxiv icon

Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation

Add code
Oct 14, 2024
Figure 1 for Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation
Figure 2 for Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation
Figure 3 for Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation
Figure 4 for Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation
Viaarxiv icon

Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis

Add code
Oct 10, 2024
Figure 1 for Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis
Figure 2 for Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis
Figure 3 for Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis
Figure 4 for Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis
Viaarxiv icon

PredFormer: Transformers Are Effective Spatial-Temporal Predictive Learners

Add code
Oct 07, 2024
Viaarxiv icon