Picture for Nanyi Fei

Nanyi Fei

CoTBal: Comprehensive Task Balancing for Multi-Task Visual Instruction Tuning

Add code
Mar 07, 2024
Viaarxiv icon

Improvable Gap Balancing for Multi-Task Learning

Add code
Jul 28, 2023
Viaarxiv icon

VDT: An Empirical Study on Video Diffusion with Transformers

Add code
May 22, 2023
Viaarxiv icon

LGDN: Language-Guided Denoising Network for Video-Language Modeling

Add code
Oct 03, 2022
Figure 1 for LGDN: Language-Guided Denoising Network for Video-Language Modeling
Figure 2 for LGDN: Language-Guided Denoising Network for Video-Language Modeling
Figure 3 for LGDN: Language-Guided Denoising Network for Video-Language Modeling
Figure 4 for LGDN: Language-Guided Denoising Network for Video-Language Modeling
Viaarxiv icon

Multimodal foundation models are better simulators of the human brain

Add code
Aug 17, 2022
Figure 1 for Multimodal foundation models are better simulators of the human brain
Figure 2 for Multimodal foundation models are better simulators of the human brain
Figure 3 for Multimodal foundation models are better simulators of the human brain
Figure 4 for Multimodal foundation models are better simulators of the human brain
Viaarxiv icon

COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval

Add code
Apr 15, 2022
Figure 1 for COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval
Figure 2 for COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval
Figure 3 for COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval
Figure 4 for COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval
Viaarxiv icon

A Roadmap for Big Model

Add code
Apr 02, 2022
Figure 1 for A Roadmap for Big Model
Figure 2 for A Roadmap for Big Model
Figure 3 for A Roadmap for Big Model
Figure 4 for A Roadmap for Big Model
Viaarxiv icon

WenLan 2.0: Make AI Imagine via a Multimodal Foundation Model

Add code
Oct 27, 2021
Figure 1 for WenLan 2.0: Make AI Imagine via a Multimodal Foundation Model
Figure 2 for WenLan 2.0: Make AI Imagine via a Multimodal Foundation Model
Figure 3 for WenLan 2.0: Make AI Imagine via a Multimodal Foundation Model
Figure 4 for WenLan 2.0: Make AI Imagine via a Multimodal Foundation Model
Viaarxiv icon

Contrastive Prototype Learning with Augmented Embeddings for Few-Shot Learning

Add code
Jan 23, 2021
Figure 1 for Contrastive Prototype Learning with Augmented Embeddings for Few-Shot Learning
Figure 2 for Contrastive Prototype Learning with Augmented Embeddings for Few-Shot Learning
Figure 3 for Contrastive Prototype Learning with Augmented Embeddings for Few-Shot Learning
Figure 4 for Contrastive Prototype Learning with Augmented Embeddings for Few-Shot Learning
Viaarxiv icon

Meta-Learning across Meta-Tasks for Few-Shot Learning

Add code
Mar 09, 2020
Figure 1 for Meta-Learning across Meta-Tasks for Few-Shot Learning
Figure 2 for Meta-Learning across Meta-Tasks for Few-Shot Learning
Figure 3 for Meta-Learning across Meta-Tasks for Few-Shot Learning
Figure 4 for Meta-Learning across Meta-Tasks for Few-Shot Learning
Viaarxiv icon