This paper presents a data driven approach to multi-modal fusion, where optimal features for each sensor are selected from a common hidden space between the different modalities. The existence of such a hidden space is then used in order to detect damaged sensors and safeguard the performance of the system. Experimental results show that such an approach can make the system robust against noisy/damaged sensors, without requiring human intervention to inform the system about the damage.