Picture for Jinheng Xie

Jinheng Xie

Show-o: One Single Transformer to Unify Multimodal Understanding and Generation

Add code
Aug 22, 2024
Figure 1 for Show-o: One Single Transformer to Unify Multimodal Understanding and Generation
Figure 2 for Show-o: One Single Transformer to Unify Multimodal Understanding and Generation
Figure 3 for Show-o: One Single Transformer to Unify Multimodal Understanding and Generation
Figure 4 for Show-o: One Single Transformer to Unify Multimodal Understanding and Generation
Viaarxiv icon

Learning Video Context as Interleaved Multimodal Sequences

Add code
Jul 31, 2024
Viaarxiv icon

WMAdapter: Adding WaterMark Control to Latent Diffusion Models

Add code
Jun 12, 2024
Viaarxiv icon

Learning Long-form Video Prior via Generative Pre-Training

Add code
Apr 24, 2024
Figure 1 for Learning Long-form Video Prior via Generative Pre-Training
Figure 2 for Learning Long-form Video Prior via Generative Pre-Training
Figure 3 for Learning Long-form Video Prior via Generative Pre-Training
Figure 4 for Learning Long-form Video Prior via Generative Pre-Training
Viaarxiv icon

Towards Highly Realistic Artistic Style Transfer via Stable Diffusion with Step-aware and Layer-aware Prompt

Add code
Apr 17, 2024
Viaarxiv icon

Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion Models

Add code
Apr 03, 2024
Viaarxiv icon

Question-Answer Cross Language Image Matching for Weakly Supervised Semantic Segmentation

Add code
Jan 18, 2024
Viaarxiv icon

HEAP: Unsupervised Object Discovery and Localization with Contrastive Grouping

Add code
Jan 04, 2024
Viaarxiv icon

TCSloT: Text Guided 3D Context and Slope Aware Triple Network for Dental Implant Position Prediction

Add code
Aug 10, 2023
Viaarxiv icon

BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion

Add code
Aug 10, 2023
Viaarxiv icon