Picture for Enxin Song

Enxin Song

Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis

Add code
Oct 10, 2024
Figure 1 for Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis
Figure 2 for Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis
Figure 3 for Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis
Figure 4 for Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis
Viaarxiv icon

AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark

Add code
Oct 04, 2024
Viaarxiv icon

MovieChat+: Question-aware Sparse Memory for Long Video Question Answering

Add code
Apr 26, 2024
Viaarxiv icon

Devil in the Number: Towards Robust Multi-modality Data Filter

Add code
Sep 24, 2023
Figure 1 for Devil in the Number: Towards Robust Multi-modality Data Filter
Figure 2 for Devil in the Number: Towards Robust Multi-modality Data Filter
Figure 3 for Devil in the Number: Towards Robust Multi-modality Data Filter
Figure 4 for Devil in the Number: Towards Robust Multi-modality Data Filter
Viaarxiv icon

MovieChat: From Dense Token to Sparse Memory for Long Video Understanding

Add code
Jul 31, 2023
Figure 1 for MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
Figure 2 for MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
Figure 3 for MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
Figure 4 for MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
Viaarxiv icon