Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Foundations and Recent Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions

Sep 07, 2022

Paul Pu Liang, Amir Zadeh, Louis-Philippe Morency

Figure 1 for Foundations and Recent Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions

Figure 2 for Foundations and Recent Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions

Figure 3 for Foundations and Recent Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions

Figure 4 for Foundations and Recent Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions

Share this with someone who'll enjoy it:

Abstract:Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design computer agents with intelligent capabilities such as understanding, reasoning, and learning through integrating multiple communicative modalities, including linguistic, acoustic, visual, tactile, and physiological messages. With the recent interest in video understanding, embodied autonomous agents, text-to-image generation, and multisensor fusion in application domains such as healthcare and robotics, multimodal machine learning has brought unique computational and theoretical challenges to the machine learning community given the heterogeneity of data sources and the interconnections often found between modalities. However, the breadth of progress in multimodal research has made it difficult to identify the common themes and open questions in the field. By synthesizing a broad range of application domains and theoretical frameworks from both historical and recent perspectives, this paper is designed to provide an overview of the computational and theoretical foundations of multimodal machine learning. We start by defining two key principles of modality heterogeneity and interconnections that have driven subsequent innovations, and propose a taxonomy of 6 core technical challenges: representation, alignment, reasoning, generation, transference, and quantification covering historical and recent trends. Recent technical achievements will be presented through the lens of this taxonomy, allowing researchers to understand the similarities and differences across new approaches. We end by motivating several open problems for future research as identified by our taxonomy.

View paper on

Share this with someone who'll enjoy it:

Title:Foundations and Recent Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions

Paper and Code