Picture for Andrew Tao

Andrew Tao

RADIO Amplified: Improved Baselines for Agglomerative Vision Foundation Models

Add code
Dec 10, 2024
Viaarxiv icon

OMCAT: Omni Context Aware Transformer

Add code
Oct 15, 2024
Figure 1 for OMCAT: Omni Context Aware Transformer
Figure 2 for OMCAT: Omni Context Aware Transformer
Figure 3 for OMCAT: Omni Context Aware Transformer
Figure 4 for OMCAT: Omni Context Aware Transformer
Viaarxiv icon

PHI-S: Distribution Balancing for Label-Free Multi-Teacher Distillation

Add code
Oct 02, 2024
Figure 1 for PHI-S: Distribution Balancing for Label-Free Multi-Teacher Distillation
Figure 2 for PHI-S: Distribution Balancing for Label-Free Multi-Teacher Distillation
Figure 3 for PHI-S: Distribution Balancing for Label-Free Multi-Teacher Distillation
Figure 4 for PHI-S: Distribution Balancing for Label-Free Multi-Teacher Distillation
Viaarxiv icon

Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders

Add code
Aug 28, 2024
Figure 1 for Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
Figure 2 for Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
Figure 3 for Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
Figure 4 for Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
Viaarxiv icon

Wolf: Captioning Everything with a World Summarization Framework

Add code
Jul 26, 2024
Figure 1 for Wolf: Captioning Everything with a World Summarization Framework
Figure 2 for Wolf: Captioning Everything with a World Summarization Framework
Figure 3 for Wolf: Captioning Everything with a World Summarization Framework
Figure 4 for Wolf: Captioning Everything with a World Summarization Framework
Viaarxiv icon

X-VILA: Cross-Modality Alignment for Large Language Model

Add code
May 29, 2024
Figure 1 for X-VILA: Cross-Modality Alignment for Large Language Model
Figure 2 for X-VILA: Cross-Modality Alignment for Large Language Model
Figure 3 for X-VILA: Cross-Modality Alignment for Large Language Model
Figure 4 for X-VILA: Cross-Modality Alignment for Large Language Model
Viaarxiv icon

VILA: On Pre-training for Visual Language Models

Add code
Dec 14, 2023
Viaarxiv icon

FasterViT: Fast Vision Transformers with Hierarchical Attention

Add code
Jun 09, 2023
Viaarxiv icon

Progressive Learning of 3D Reconstruction Network from 2D GAN Data

Add code
May 18, 2023
Figure 1 for Progressive Learning of 3D Reconstruction Network from 2D GAN Data
Figure 2 for Progressive Learning of 3D Reconstruction Network from 2D GAN Data
Figure 3 for Progressive Learning of 3D Reconstruction Network from 2D GAN Data
Figure 4 for Progressive Learning of 3D Reconstruction Network from 2D GAN Data
Viaarxiv icon

Preserve Your Own Correlation: A Noise Prior for Video Diffusion Models

Add code
May 17, 2023
Viaarxiv icon