Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Kaiyi Luo

RREH: Reconstruction Relations Embedded Hashing for Semi-Paired Cross-Modal Retrieval

May 28, 2024

Jianzong Wang, Haoxiang Shi, Kaiyi Luo, Xulong Zhang, Ning Cheng, Jing Xiao

Figure 1 for RREH: Reconstruction Relations Embedded Hashing for Semi-Paired Cross-Modal Retrieval

Figure 2 for RREH: Reconstruction Relations Embedded Hashing for Semi-Paired Cross-Modal Retrieval

Figure 3 for RREH: Reconstruction Relations Embedded Hashing for Semi-Paired Cross-Modal Retrieval

Figure 4 for RREH: Reconstruction Relations Embedded Hashing for Semi-Paired Cross-Modal Retrieval

Abstract:Known for efficient computation and easy storage, hashing has been extensively explored in cross-modal retrieval. The majority of current hashing models are predicated on the premise of a direct one-to-one mapping between data points. However, in real practice, data correspondence across modalities may be partially provided. In this research, we introduce an innovative unsupervised hashing technique designed for semi-paired cross-modal retrieval tasks, named Reconstruction Relations Embedded Hashing (RREH). RREH assumes that multi-modal data share a common subspace. For paired data, RREH explores the latent consistent information of heterogeneous modalities by seeking a shared representation. For unpaired data, to effectively capture the latent discriminative features, the high-order relationships between unpaired data and anchors are embedded into the latent subspace, which are computed by efficient linear reconstruction. The anchors are sampled from paired data, which improves the efficiency of hash learning. The RREH trains the underlying features and the binary encodings in a unified framework with high-order reconstruction relations preserved. With the well devised objective function and discrete optimization algorithm, RREH is designed to be scalable, making it suitable for large-scale datasets and facilitating efficient cross-modal retrieval. In the evaluation process, the proposed is tested with partially paired data to establish its superiority over several existing methods.

* Accepted by the 20th International Conference on Intelligent Computing (ICIC 2024)

Via

Access Paper or Ask Questions

Contrastive Latent Space Reconstruction Learning for Audio-Text Retrieval

Sep 16, 2023

Kaiyi Luo, Xulong Zhang, Jianzong Wang, Huaxiong Li, Ning Cheng, Jing Xiao

Figure 1 for Contrastive Latent Space Reconstruction Learning for Audio-Text Retrieval

Figure 2 for Contrastive Latent Space Reconstruction Learning for Audio-Text Retrieval

Figure 3 for Contrastive Latent Space Reconstruction Learning for Audio-Text Retrieval

Abstract:Cross-modal retrieval (CMR) has been extensively applied in various domains, such as multimedia search engines and recommendation systems. Most existing CMR methods focus on image-to-text retrieval, whereas audio-to-text retrieval, a less explored domain, has posed a great challenge due to the difficulty to uncover discriminative features from audio clips and texts. Existing studies are restricted in the following two ways: 1) Most researchers utilize contrastive learning to construct a common subspace where similarities among data can be measured. However, they considers only cross-modal transformation, neglecting the intra-modal separability. Besides, the temperature parameter is not adaptively adjusted along with semantic guidance, which degrades the performance. 2) These methods do not take latent representation reconstruction into account, which is essential for semantic alignment. This paper introduces a novel audio-text oriented CMR approach, termed Contrastive Latent Space Reconstruction Learning (CLSR). CLSR improves contrastive representation learning by taking intra-modal separability into account and adopting an adaptive temperature control strategy. Moreover, the latent representation reconstruction modules are embedded into the CMR framework, which improves modal interaction. Experiments in comparison with some state-of-the-art methods on two audio-text datasets have validated the superiority of CLSR.

* Accepted by The 35th IEEE International Conference on Tools with Artificial Intelligence. (ICTAI 2023)

Via

Access Paper or Ask Questions