Abstract:Hallucination remains a fundamental challenge for Multimodal Large Language Models (MLLMs). While Direct Preference Optimization (DPO) is a key alignment framework, existing approaches often rely heavily on costly external evaluators for scoring or rewriting, incurring off-policy learnability gaps and discretization loss. Due to the lack of access to internal states, such feedback overlooks the fine-grained conflicts between different modalities that lead to hallucinations during generation. To address this issue, we propose IRIS (Implicit Reward-Guided Internal Sifting), which leverages continuous implicit rewards in the native log-probability space to preserve full information density and capture internal modal competition. This on-policy paradigm eliminates learnability gaps by utilizing self-generated preference pairs. By sifting these pairs based on multimodal implicit rewards, IRIS ensures that optimization is driven by signals that directly resolve modal conflicts. Extensive experiments demonstrate that IRIS achieves highly competitive performance on key hallucination benchmarks using only 5.7k samples, without requiring any external feedback during preference alignment. These results confirm that IRIS provides an efficient and principled paradigm for mitigating MLLM hallucinations.




Abstract:Evolutionary multitasking (EMT) has been attracting much attention over the past years. It aims to handle multiple optimization tasks simultaneously within limited computing resources assisted by inter-task knowledge transfer techniques. Numerous multitask evolutionary algorithms (MTEAs) for solving multitask optimization (MTO) problems have been proposed in the EMT field, but there lacks a comprehensive software platform to help researchers evaluate MTEA performance on benchmark MTO problems as well as explore real-world applications. To address this issue, we introduce the first open-source optimization platform, named MTO-Platform (MToP), for EMT. It incorporates more than 30 MTEAs, more than 150 MTO problem cases with real-world applications, and more than 10 performance metrics. Moreover, for comparing MTEAs with traditional evolutionary algorithms, we modified more than 30 popular single-task evolutionary algorithms to be able to solve MTO problems in MToP. MToP is a user-friendly tool with a graphical user interface that makes it easy to analyze results, export data, and plot schematics. More importantly, MToP is extensible, allowing users to develop new algorithms and define new problems. The source code of MToP is available at https://github.com/intLyc/MTO-Platform.