Picture for Ruoyi Du

Ruoyi Du

I-Max: Maximize the Resolution Potential of Pre-trained Rectified Flow Transformers with Projected Flow

Add code
Oct 10, 2024
Figure 1 for I-Max: Maximize the Resolution Potential of Pre-trained Rectified Flow Transformers with Projected Flow
Figure 2 for I-Max: Maximize the Resolution Potential of Pre-trained Rectified Flow Transformers with Projected Flow
Figure 3 for I-Max: Maximize the Resolution Potential of Pre-trained Rectified Flow Transformers with Projected Flow
Figure 4 for I-Max: Maximize the Resolution Potential of Pre-trained Rectified Flow Transformers with Projected Flow
Viaarxiv icon

Zero-Shot Audio Captioning Using Soft and Hard Prompts

Add code
Jun 10, 2024
Viaarxiv icon

Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers

Add code
May 09, 2024
Figure 1 for Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers
Figure 2 for Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers
Figure 3 for Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers
Figure 4 for Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers
Viaarxiv icon

DemoFusion: Democratising High-Resolution Image Generation With No $$$

Add code
Nov 24, 2023
Viaarxiv icon

Multi-View Active Fine-Grained Recognition

Add code
Jun 02, 2022
Figure 1 for Multi-View Active Fine-Grained Recognition
Figure 2 for Multi-View Active Fine-Grained Recognition
Figure 3 for Multi-View Active Fine-Grained Recognition
Figure 4 for Multi-View Active Fine-Grained Recognition
Viaarxiv icon

Learning Invariant Visual Representations for Compositional Zero-Shot Learning

Add code
Jun 02, 2022
Figure 1 for Learning Invariant Visual Representations for Compositional Zero-Shot Learning
Figure 2 for Learning Invariant Visual Representations for Compositional Zero-Shot Learning
Figure 3 for Learning Invariant Visual Representations for Compositional Zero-Shot Learning
Figure 4 for Learning Invariant Visual Representations for Compositional Zero-Shot Learning
Viaarxiv icon

Caption Feature Space Regularization for Audio Captioning

Add code
Apr 18, 2022
Figure 1 for Caption Feature Space Regularization for Audio Captioning
Figure 2 for Caption Feature Space Regularization for Audio Captioning
Figure 3 for Caption Feature Space Regularization for Audio Captioning
Figure 4 for Caption Feature Space Regularization for Audio Captioning
Viaarxiv icon

Domain Generalization via Frequency-based Feature Disentanglement and Interaction

Add code
Jan 20, 2022
Figure 1 for Domain Generalization via Frequency-based Feature Disentanglement and Interaction
Figure 2 for Domain Generalization via Frequency-based Feature Disentanglement and Interaction
Figure 3 for Domain Generalization via Frequency-based Feature Disentanglement and Interaction
Figure 4 for Domain Generalization via Frequency-based Feature Disentanglement and Interaction
Viaarxiv icon

Clue Me In: Semi-Supervised FGVC with Out-of-Distribution Data

Add code
Dec 06, 2021
Figure 1 for Clue Me In: Semi-Supervised FGVC with Out-of-Distribution Data
Figure 2 for Clue Me In: Semi-Supervised FGVC with Out-of-Distribution Data
Figure 3 for Clue Me In: Semi-Supervised FGVC with Out-of-Distribution Data
Figure 4 for Clue Me In: Semi-Supervised FGVC with Out-of-Distribution Data
Viaarxiv icon

Making a Bird AI Expert Work for You and Me

Add code
Dec 06, 2021
Figure 1 for Making a Bird AI Expert Work for You and Me
Figure 2 for Making a Bird AI Expert Work for You and Me
Figure 3 for Making a Bird AI Expert Work for You and Me
Figure 4 for Making a Bird AI Expert Work for You and Me
Viaarxiv icon