Picture for Zhan Tong

Zhan Tong

Contextual AD Narration with Interleaved Multimodal Sequence

Add code
Mar 19, 2024
Viaarxiv icon

TagAlign: Improving Vision-Language Alignment with Multi-Tag Classification

Add code
Dec 26, 2023
Figure 1 for TagAlign: Improving Vision-Language Alignment with Multi-Tag Classification
Figure 2 for TagAlign: Improving Vision-Language Alignment with Multi-Tag Classification
Figure 3 for TagAlign: Improving Vision-Language Alignment with Multi-Tag Classification
Figure 4 for TagAlign: Improving Vision-Language Alignment with Multi-Tag Classification
Viaarxiv icon

Bootstrapping SparseFormers from Vision Foundation Models

Add code
Dec 04, 2023
Figure 1 for Bootstrapping SparseFormers from Vision Foundation Models
Figure 2 for Bootstrapping SparseFormers from Vision Foundation Models
Figure 3 for Bootstrapping SparseFormers from Vision Foundation Models
Figure 4 for Bootstrapping SparseFormers from Vision Foundation Models
Viaarxiv icon

Advancing Vision Transformers with Group-Mix Attention

Add code
Nov 26, 2023
Viaarxiv icon

Speed Co-Augmentation for Unsupervised Audio-Visual Pre-training

Add code
Sep 25, 2023
Viaarxiv icon

TVTSv2: Learning Out-of-the-box Spatiotemporal Visual Representations at Scale

Add code
May 23, 2023
Viaarxiv icon

VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking

Add code
Apr 18, 2023
Viaarxiv icon

Efficient Video Action Detection with Token Dropout and Context Refinement

Add code
Apr 17, 2023
Viaarxiv icon

SparseFormer: Sparse Visual Recognition via Limited Latent Tokens

Add code
Apr 07, 2023
Viaarxiv icon

Soft Neighbors are Positive Supporters in Contrastive Visual Representation Learning

Add code
Mar 30, 2023
Viaarxiv icon