ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision

Add code
Feb 05, 2021
Figure 1 for ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision
Figure 2 for ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision
Figure 3 for ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision
Figure 4 for ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision

Share this with someone who'll enjoy it:

View paper onarxiv icon

Share this with someone who'll enjoy it: