Picture for Le Zhuo

Le Zhuo

Multimodal Music Generation with Explicit Bridges and Retrieval Augmentation

Add code
Dec 12, 2024
Viaarxiv icon

VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection

Add code
Nov 22, 2024
Viaarxiv icon

Customize Your Visual Autoregressive Recipe with Set Autoregressive Modeling

Add code
Oct 14, 2024
Viaarxiv icon

I-Max: Maximize the Resolution Potential of Pre-trained Rectified Flow Transformers with Projected Flow

Add code
Oct 10, 2024
Figure 1 for I-Max: Maximize the Resolution Potential of Pre-trained Rectified Flow Transformers with Projected Flow
Figure 2 for I-Max: Maximize the Resolution Potential of Pre-trained Rectified Flow Transformers with Projected Flow
Figure 3 for I-Max: Maximize the Resolution Potential of Pre-trained Rectified Flow Transformers with Projected Flow
Figure 4 for I-Max: Maximize the Resolution Potential of Pre-trained Rectified Flow Transformers with Projected Flow
Viaarxiv icon

LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation

Add code
Aug 28, 2024
Viaarxiv icon

Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining

Add code
Aug 05, 2024
Viaarxiv icon

Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers

Add code
May 09, 2024
Figure 1 for Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers
Figure 2 for Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers
Figure 3 for Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers
Figure 4 for Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers
Viaarxiv icon

ProtLLM: An Interleaved Protein-Language LLM with Protein-as-Word Pre-Training

Add code
Feb 28, 2024
Figure 1 for ProtLLM: An Interleaved Protein-Language LLM with Protein-as-Word Pre-Training
Figure 2 for ProtLLM: An Interleaved Protein-Language LLM with Protein-as-Word Pre-Training
Figure 3 for ProtLLM: An Interleaved Protein-Language LLM with Protein-as-Word Pre-Training
Figure 4 for ProtLLM: An Interleaved Protein-Language LLM with Protein-as-Word Pre-Training
Viaarxiv icon

LLMs as Visual Explainers: Advancing Image Classification with Evolving Visual Descriptions

Add code
Nov 20, 2023
Figure 1 for LLMs as Visual Explainers: Advancing Image Classification with Evolving Visual Descriptions
Figure 2 for LLMs as Visual Explainers: Advancing Image Classification with Evolving Visual Descriptions
Figure 3 for LLMs as Visual Explainers: Advancing Image Classification with Evolving Visual Descriptions
Figure 4 for LLMs as Visual Explainers: Advancing Image Classification with Evolving Visual Descriptions
Viaarxiv icon

GraphText: Graph Reasoning in Text Space

Add code
Oct 02, 2023
Figure 1 for GraphText: Graph Reasoning in Text Space
Figure 2 for GraphText: Graph Reasoning in Text Space
Figure 3 for GraphText: Graph Reasoning in Text Space
Figure 4 for GraphText: Graph Reasoning in Text Space
Viaarxiv icon