Abstract:Deep neural networks allow for fusion of high-level features from multiple modalities and have become a promising end-to-end solution for multi-modal sensor fusion. While the recently proposed gating architectures improve the conventional fusion mechanisms employed in CNNs, these models are not always resilient particularly under the presence of sensor failures. This paper shows that the existing gating architectures fail to robustly learn the fusion weights that critically gate different modalities, leading to the issue of fusion weight inconsistency. We propose a new gating architecture by incorporating an auxiliary model to regularize the main model such that the fusion weight for each sensory modality can be robustly learned. As a result, this new auxiliary-model regulated architecture and its variants outperform the existing non-gating and gating fusion architectures under both clean and corrupted sensory inputs resulted from sensor failures. The obtained performance gains are rather significant in the latter case.
Abstract:Sensor fusion is a key technology that integrates various sensory inputs to allow for robust decision making in many applications such as autonomous driving and robot control. Deep neural networks have been adopted for sensor fusion in a body of recent studies. Among these, the so-called netgated architecture was proposed, which has demonstrated improved performances over the conventional convolutional neural networks (CNN). In this paper, we address several limitations of the baseline negated architecture by proposing two further optimized architectures: a coarser-grained gated architecture employing (feature) group-level fusion weights and a two-stage gated architectures leveraging both the group-level and feature level fusion weights. Using driving mode prediction and human activity recognition datasets, we demonstrate the significant performance improvements brought by the proposed gated architectures and also their robustness in the presence of sensor noise and failures.