Picture for Andrew Tao

Andrew Tao

RADIO Amplified: Improved Baselines for Agglomerative Vision Foundation Models

Add code
Dec 10, 2024
Viaarxiv icon

OMCAT: Omni Context Aware Transformer

Add code
Oct 15, 2024
Viaarxiv icon

PHI-S: Distribution Balancing for Label-Free Multi-Teacher Distillation

Add code
Oct 02, 2024
Viaarxiv icon

Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders

Add code
Aug 28, 2024
Figure 1 for Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
Figure 2 for Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
Figure 3 for Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
Figure 4 for Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
Viaarxiv icon

Wolf: Captioning Everything with a World Summarization Framework

Add code
Jul 26, 2024
Figure 1 for Wolf: Captioning Everything with a World Summarization Framework
Figure 2 for Wolf: Captioning Everything with a World Summarization Framework
Figure 3 for Wolf: Captioning Everything with a World Summarization Framework
Figure 4 for Wolf: Captioning Everything with a World Summarization Framework
Viaarxiv icon

X-VILA: Cross-Modality Alignment for Large Language Model

Add code
May 29, 2024
Figure 1 for X-VILA: Cross-Modality Alignment for Large Language Model
Figure 2 for X-VILA: Cross-Modality Alignment for Large Language Model
Figure 3 for X-VILA: Cross-Modality Alignment for Large Language Model
Figure 4 for X-VILA: Cross-Modality Alignment for Large Language Model
Viaarxiv icon

VILA: On Pre-training for Visual Language Models

Add code
Dec 14, 2023
Viaarxiv icon

FasterViT: Fast Vision Transformers with Hierarchical Attention

Add code
Jun 09, 2023
Viaarxiv icon

Progressive Learning of 3D Reconstruction Network from 2D GAN Data

Add code
May 18, 2023
Viaarxiv icon

Preserve Your Own Correlation: A Noise Prior for Video Diffusion Models

Add code
May 17, 2023
Viaarxiv icon