Picture for Haoqi Fan

Haoqi Fan

LLaVA-Critic: Learning to Evaluate Multimodal Models

Add code
Oct 03, 2024
Viaarxiv icon

Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles

Add code
Jun 01, 2023
Viaarxiv icon

Diffusion Models as Masked Autoencoders

Add code
Apr 06, 2023
Viaarxiv icon

The effectiveness of MAE pre-pretraining for billion-scale pretraining

Add code
Mar 23, 2023
Viaarxiv icon

Reversible Vision Transformers

Add code
Feb 09, 2023
Viaarxiv icon

MAViL: Masked Audio-Video Learners

Add code
Dec 15, 2022
Viaarxiv icon

Scaling Language-Image Pre-training via Masking

Add code
Dec 01, 2022
Viaarxiv icon

Castling-ViT: Compressing Self-Attention via Switching Towards Linear-Angular Attention During Vision Transformer Inference

Add code
Nov 18, 2022
Viaarxiv icon

Masked Autoencoders As Spatiotemporal Learners

Add code
May 18, 2022
Figure 1 for Masked Autoencoders As Spatiotemporal Learners
Figure 2 for Masked Autoencoders As Spatiotemporal Learners
Figure 3 for Masked Autoencoders As Spatiotemporal Learners
Figure 4 for Masked Autoencoders As Spatiotemporal Learners
Viaarxiv icon

On the Importance of Asymmetry for Siamese Representation Learning

Add code
Apr 01, 2022
Figure 1 for On the Importance of Asymmetry for Siamese Representation Learning
Figure 2 for On the Importance of Asymmetry for Siamese Representation Learning
Figure 3 for On the Importance of Asymmetry for Siamese Representation Learning
Figure 4 for On the Importance of Asymmetry for Siamese Representation Learning
Viaarxiv icon