Abstract:Reliable multimodal learning in the presence of noisy data is a widely concerned issue, especially in safety-critical applications. Many reliable multimodal methods delve into addressing modality-specific or cross-modality noise. However, they fail to handle the coexistence of both types of noise efficiently. Moreover, the lack of comprehensive consideration for noise at both global and individual levels limits their reliability. To address these issues, a reliable multimodal classification method dubbed Multi-Level Inter-Class Confusing Information Removal Network (MICINet) is proposed. MICINet achieves the reliable removal of both types of noise by unifying them into the concept of Inter-class Confusing Information (\textit{ICI}) and eliminating it at both global and individual levels. Specifically, MICINet first reliably learns the global \textit{ICI} distribution through the proposed \textbf{\textit{Global \textbf{ICI} Learning Module}}. Then, it introduces the \textbf{\textit{Global-guided Sample ICI Learning module}} to efficiently remove global-level \textit{ICI} from sample features utilizing the learned global \textit{ICI} distribution. Subsequently, the \textbf{\textit{Sample-adaptive Cross-modality Information Compensation module}} is designed to remove individual-level \textit{ICI} from each sample reliably. This is achieved through interpretable cross-modality information compensation based on the complementary relationship between discriminative features and \textit{ICI} and the perception of the relative quality of modalities introduced by the relative discriminative power. Experiments on four datasets demonstrate that MICINet outperforms other state-of-the-art reliable multimodal classification methods under various noise conditions.
Abstract:Integrating complementary information from different data modalities can yield representation with stronger expressive ability. However, data quality varies across multimodal samples, highlighting the need for learning reliable multimodal representations, especially in safety-critical applications. This paper focuses on an aspect that existing methods in this domain commonly overlook: the importance of network dynamics and adaptability in providing reliable results from diverse samples. Specifically, it highlights the model's ability to dynamically adjust its capacity and behaviour according to different samples, using the adjusted network for predicting each sample. To this end, we propose a novel framework for multimodal reliable classification termed Quality-adaptive Dynamic Multimodal Network (QADM-Net). QADM-Net first introduces a confidence-guided dynamic depths mechanism to achieve the appropriate network capacity. This mechanism adjusts the network depth according to the difficulty of each sample, which is determined by the quality of its modalities. Subsequently, we develop an informativeness-based dynamic parameters mechanism that enables QADM-Net to perform unique inference behaviour on each of the diverse samples with feature-level quality variation presented in their feature vectors. In this way, QADM-Net adequately adapts its capacity and behaviour on each sample by investigating the quality variation of samples at both modality and feature levels, thus enhancing the reliability of classification results. Experiments conducted on four datasets demonstrate that QADM-Net significantly outperforms state-of-the-art methods in classification performance and exhibits strong adaptability to data with diverse quality.