Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Transformer is All You Need: Multimodal Multitask Learning with a Unified Transformer

Feb 22, 2021

Ronghang Hu, Amanpreet Singh

Figure 1 for Transformer is All You Need: Multimodal Multitask Learning with a Unified Transformer

Figure 2 for Transformer is All You Need: Multimodal Multitask Learning with a Unified Transformer

Figure 3 for Transformer is All You Need: Multimodal Multitask Learning with a Unified Transformer

Figure 4 for Transformer is All You Need: Multimodal Multitask Learning with a Unified Transformer

Share this with someone who'll enjoy it:

Abstract:We propose UniT, a Unified Transformer model to simultaneously learn the most prominent tasks across different domains, ranging from object detection to language understanding and multimodal reasoning. Based on the transformer encoder-decoder architecture, our UniT model encodes each input modality with an encoder and makes predictions on each task with a shared decoder over the encoded input representations, followed by task-specific output heads. The entire model is jointly trained end-to-end with losses from each task. Compared to previous efforts on multi-task learning with transformers, we share the same model parameters to all tasks instead of separately fine-tuning task-specific models and handle a much higher variety of tasks across different domains. In our experiments, we learn 7 tasks jointly over 8 datasets, achieving comparable performance to well-established prior work on each domain under the same supervision with a compact set of model parameters. Code will be released in MMF at https://mmf.sh.

* 15 pages

View paper on

Share this with someone who'll enjoy it:

Title:Transformer is All You Need: Multimodal Multitask Learning with a Unified Transformer

Paper and Code