Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Learning Multimodal VAEs through Mutual Supervision

Jul 01, 2021

Tom Joy, Yuge Shi, Philip H. S. Torr, Tom Rainforth, Sebastian M. Schmon, N. Siddharth

Figure 1 for Learning Multimodal VAEs through Mutual Supervision

Figure 2 for Learning Multimodal VAEs through Mutual Supervision

Figure 3 for Learning Multimodal VAEs through Mutual Supervision

Figure 4 for Learning Multimodal VAEs through Mutual Supervision

Share this with someone who'll enjoy it:

Abstract:Multimodal VAEs seek to model the joint distribution over heterogeneous data (e.g.\ vision, language), whilst also capturing a shared representation across such modalities. Prior work has typically combined information from the modalities by reconciling idiosyncratic representations directly in the recognition model through explicit products, mixtures, or other such factorisations. Here we introduce a novel alternative, the MEME, that avoids such explicit combinations by repurposing semi-supervised VAEs to combine information between modalities implicitly through mutual supervision. This formulation naturally allows learning from partially-observed data where some modalities can be entirely missing -- something that most existing approaches either cannot handle, or do so to a limited extent. We demonstrate that MEME outperforms baselines on standard metrics across both partial and complete observation schemes on the MNIST-SVHN (image-image) and CUB (image-text) datasets. We also contrast the quality of the representations learnt by mutual supervision against standard approaches and observe interesting trends in its ability to capture relatedness between data.

View paper on

OpenReview

Share this with someone who'll enjoy it:

Title:Learning Multimodal VAEs through Mutual Supervision

Paper and Code