Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Deep Multimodal Neural Architecture Search

Apr 25, 2020

Zhou Yu, Yuhao Cui, Jun Yu, Meng Wang, Dacheng Tao, Qi Tian

Figure 1 for Deep Multimodal Neural Architecture Search

Figure 2 for Deep Multimodal Neural Architecture Search

Figure 3 for Deep Multimodal Neural Architecture Search

Figure 4 for Deep Multimodal Neural Architecture Search

Share this with someone who'll enjoy it:

Abstract:Designing effective neural networks is fundamentally important in deep multimodal learning. Most existing works focus on a single task and design neural architectures manually, which are highly task-specific and hard to generalize to different tasks. In this paper, we devise a generalized deep multimodal neural architecture search (MMnas) framework for various multimodal learning tasks. Given multimodal input, we first define a set of primitive operations, and then construct a deep encoder-decoder based unified backbone, where each encoder or decoder block corresponds to an operation searched from a predefined operation pool. On top of the unified backbone, we attach task-specific heads to tackle different multimodal learning tasks. By using a gradient-based NAS algorithm, the optimal architectures for different tasks are learned efficiently. Extensive ablation studies, comprehensive analysis, and superior experimental results show that MMnasNet significantly outperforms existing state-of-the-art approaches across three multimodal learning tasks (over five datasets), including visual question answering, image-text matching, and visual grounding. Code will be made available.

* 10 pages, 4 figures

View paper on

Share this with someone who'll enjoy it:

Title:Deep Multimodal Neural Architecture Search

Paper and Code