VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text

Add code
Apr 22, 2021
Figure 1 for VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text
Figure 2 for VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text
Figure 3 for VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text
Figure 4 for VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text

Share this with someone who'll enjoy it:

View paper onarxiv iconopen_review iconOpenReview

Share this with someone who'll enjoy it: