Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yufeng Hou

Contrastive masked auto-encoders based self-supervised hashing for 2D image and 3D point cloud cross-modal retrieval

Aug 11, 2024

Rukai Wei, Heng Cui, Yu Liu, Yufeng Hou, Yanzhao Xie, Ke Zhou

Figure 1 for Contrastive masked auto-encoders based self-supervised hashing for 2D image and 3D point cloud cross-modal retrieval

Figure 2 for Contrastive masked auto-encoders based self-supervised hashing for 2D image and 3D point cloud cross-modal retrieval

Figure 3 for Contrastive masked auto-encoders based self-supervised hashing for 2D image and 3D point cloud cross-modal retrieval

Figure 4 for Contrastive masked auto-encoders based self-supervised hashing for 2D image and 3D point cloud cross-modal retrieval

Abstract:Implementing cross-modal hashing between 2D images and 3D point-cloud data is a growing concern in real-world retrieval systems. Simply applying existing cross-modal approaches to this new task fails to adequately capture latent multi-modal semantics and effectively bridge the modality gap between 2D and 3D. To address these issues without relying on hand-crafted labels, we propose contrastive masked autoencoders based self-supervised hashing (CMAH) for retrieval between images and point-cloud data. We start by contrasting 2D-3D pairs and explicitly constraining them into a joint Hamming space. This contrastive learning process ensures robust discriminability for the generated hash codes and effectively reduces the modality gap. Moreover, we utilize multi-modal auto-encoders to enhance the model's understanding of multi-modal semantics. By completing the masked image/point-cloud data modeling task, the model is encouraged to capture more localized clues. In addition, the proposed multi-modal fusion block facilitates fine-grained interactions among different modalities. Extensive experiments on three public datasets demonstrate that the proposed CMAH significantly outperforms all baseline methods.

* Accepted by ICME 2024

Via

Access Paper or Ask Questions