Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Decoupling Contrastive Decoding: Robust Hallucination Mitigation in Multimodal Large Language Models

Apr 09, 2025

Wei Chen, Xin Yan, Bin Wen, Fan Yang, Tingting Gao, Di Zhang, Long Chen

Figure 1 for Decoupling Contrastive Decoding: Robust Hallucination Mitigation in Multimodal Large Language Models

Figure 2 for Decoupling Contrastive Decoding: Robust Hallucination Mitigation in Multimodal Large Language Models

Figure 3 for Decoupling Contrastive Decoding: Robust Hallucination Mitigation in Multimodal Large Language Models

Figure 4 for Decoupling Contrastive Decoding: Robust Hallucination Mitigation in Multimodal Large Language Models

Share this with someone who'll enjoy it:

Abstract:Although multimodal large language models (MLLMs) exhibit remarkable reasoning capabilities on complex multimodal understanding tasks, they still suffer from the notorious hallucination issue: generating outputs misaligned with obvious visual or factual evidence. Currently, training-based solutions, like direct preference optimization (DPO), leverage paired preference data to suppress hallucinations. However, they risk sacrificing general reasoning capabilities due to the likelihood displacement. Meanwhile, training-free solutions, like contrastive decoding, achieve this goal by subtracting the estimated hallucination pattern from a distorted input. Yet, these handcrafted perturbations (e.g., add noise to images) may poorly capture authentic hallucination patterns. To avoid these weaknesses of existing methods, and realize robust hallucination mitigation (i.e., maintaining general reasoning performance), we propose a novel framework: Decoupling Contrastive Decoding (DCD). Specifically, DCD decouples the learning of positive and negative samples in preference datasets, and trains separate positive and negative image projections within the MLLM. The negative projection implicitly models real hallucination patterns, which enables vision-aware negative images in the contrastive decoding inference stage. Our DCD alleviates likelihood displacement by avoiding pairwise optimization and generalizes robustly without handcrafted degradation. Extensive ablations across hallucination benchmarks and general reasoning tasks demonstrate the effectiveness of DCD, i.e., it matches DPO's hallucination suppression while preserving general capabilities and outperforms the handcrafted contrastive decoding methods.

* 13 pages, 4 figures

View paper on

Share this with someone who'll enjoy it:

Title:Decoupling Contrastive Decoding: Robust Hallucination Mitigation in Multimodal Large Language Models

Paper and Code