Abstract:Composed Image Retrieval (CIR) aims to retrieve a target image from a query composed of a reference image and modification text. Recent training-free zero-shot methods often employ Multimodal Large Language Models (MLLMs) with Chain-of-Thought (CoT) to compose a target image description for retrieval. However, due to the fuzzy matching nature of ZS-CIR, the generated description is prone to semantic bias relative to the target image. We propose SDR-CIR, a training-free Semantic Debias Ranking method based on CoT reasoning. First, Selective CoT guides the MLLM to extract visual content relevant to the modification text during image understanding, thereby reducing visual noise at the source. We then introduce a Semantic Debias Ranking with two steps, Anchor and Debias, to mitigate semantic bias. In the Anchor step, we fuse reference image features with target description features to reinforce useful semantics and supplement omitted cues. In the Debias step, we explicitly model the visual semantic contribution of the reference image to the description and incorporate it into the similarity score as a penalty term. By supplementing omitted cues while suppressing redundancy, SDR-CIR mitigates semantic bias and improves retrieval performance. Experiments on three standard CIR benchmarks show that SDR-CIR achieves state-of-the-art results among one-stage methods while maintaining high efficiency. The code is publicly available at https://github.com/suny105/SDR-CIR.
Abstract:Recently, 3D backdoor attacks have posed a substantial threat to 3D Deep Neural Networks (3D DNNs) designed for 3D point clouds, which are extensively deployed in various security-critical applications. Although the existing 3D backdoor attacks achieved high attack performance, they remain vulnerable to preprocessing-based defenses (e.g., outlier removal and rotation augmentation) and are prone to detection by human inspection. In pursuit of a more challenging-to-defend and stealthy 3D backdoor attack, this paper introduces the Stealthy and Robust Backdoor Attack (SRBA), which ensures robustness and stealthiness through intentional design considerations. The key insight of our attack involves applying a uniform shift to the additional point features of point clouds (e.g., reflection intensity) widely utilized as part of inputs for 3D DNNs as the trigger. Without altering the geometric information of the point clouds, our attack ensures visual consistency between poisoned and benign samples, and demonstrate robustness against preprocessing-based defenses. In addition, to automate our attack, we employ Bayesian Optimization (BO) to identify the suitable trigger. Extensive experiments suggest that SRBA achieves an attack success rate (ASR) exceeding 94% in all cases, and significantly outperforms previous SOTA methods when multiple preprocessing operations are applied during training.