VoLTA: Vision-Language Transformer with Weakly-Supervised Local-Feature Alignment

Add code
Oct 09, 2022
Figure 1 for VoLTA: Vision-Language Transformer with Weakly-Supervised Local-Feature Alignment
Figure 2 for VoLTA: Vision-Language Transformer with Weakly-Supervised Local-Feature Alignment
Figure 3 for VoLTA: Vision-Language Transformer with Weakly-Supervised Local-Feature Alignment
Figure 4 for VoLTA: Vision-Language Transformer with Weakly-Supervised Local-Feature Alignment

Share this with someone who'll enjoy it:

View paper onarxiv iconopen_review iconOpenReview

Share this with someone who'll enjoy it: