Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:ContraSolver: Self-Alignment of Language Models by Resolving Internal Preference Contradictions

Jun 13, 2024

Xu Zhang, Xunjian Yin, Xiaojun Wan

Figure 1 for ContraSolver: Self-Alignment of Language Models by Resolving Internal Preference Contradictions

Figure 2 for ContraSolver: Self-Alignment of Language Models by Resolving Internal Preference Contradictions

Figure 3 for ContraSolver: Self-Alignment of Language Models by Resolving Internal Preference Contradictions

Figure 4 for ContraSolver: Self-Alignment of Language Models by Resolving Internal Preference Contradictions

Share this with someone who'll enjoy it:

Abstract:While substantial advancements have been made in developing large language models (LLMs), achieving control over their behavior can be difficult. Direct preference optimization (DPO) assumes the existence of a latent reward function to evaluate the responses of LLMs. This assumption indicates a strict preference ordering of different responses to the same input. However, there always exist contradictions of preference in LLMs according to our experimental observations. In this paper, we construct a graph structure of the preference relationship among different responses with self-annotation to find contradictions in the preference order. We propose ContraSolver, an algorithm that traverses all edges on the preference graph to identify those that might cause contradictions. ContraSolver initializes the graph with a maximum spanning tree and identifies contradictory edges, prioritizing the resolution of low-confidence preferences while preserving high-confidence ones. Experimental results on four different generation tasks show that the performance of different LLMs can be largely improved through our completely unsupervised self-alignment. Furthermore, by analyzing the preference graphs of LLMs with and without self-alignment by ContraSolver, we quantify the reduction in contradictions, suggesting that resolving preference contradictions is crucial for achieving better alignment performance.

View paper on

Share this with someone who'll enjoy it:

Title:ContraSolver: Self-Alignment of Language Models by Resolving Internal Preference Contradictions

Paper and Code