Abstract:Explanation generation plays a more pivotal role than fact verification in producing interpretable results and facilitating comprehensive fact-checking, which has recently garnered considerable attention. However, previous studies on explanation generation has shown several limitations, such as being confined to English scenarios, involving overly complex inference processes, and not fully unleashing the potential of the mutual feedback between veracity labels and explanation texts. To address these issues, we construct two complex fact-checking datasets in the Chinese scenarios: CHEF-EG and TrendFact. These datasets involve complex facts in areas such as health, politics, and society, presenting significant challenges for fact verification methods. In response to these challenges, we propose a unified framework called FactISR (Augmenting Fact-Checking via Iterative Self-Revision) to perform mutual feedback between veracity and explanations by leveraging the capabilities of large language models(LLMs). FactISR uses a single model to address tasks such as fact verification and explanation generation. Its self-revision mechanism can further revision the consistency between veracity labels, explanation texts, and evidence, as well as eliminate irrelevant noise. We conducted extensive experiments with baselines and FactISR on the proposed datasets. The experimental results demonstrate the effectiveness of our method.
Abstract:Iterative preference optimization has recently become one of the de-facto training paradigms for large language models (LLMs), but the performance is still underwhelming due to too much noisy preference data yielded in the loop. To combat this issue, we present an \textbf{U}ncertainty-enhanced \textbf{P}reference \textbf{O}ptimization (UPO) framework to make the LLM self-evolve with reliable feedback. The key idea is mitigating the noisy preference data derived from the current policy and reward models by performing pair-wise uncertainty estimation and judiciously reliable feedback sampling. To reach this goal, we thus introduce an estimator model, which incorporates Monte Carlo (MC) dropout in Bayesian neural network (BNN) to perform uncertainty estimation for the preference data derived from the LLM policy. Compared to the existing methods that directly filter generated responses based on the reward score, the estimator focuses on the model uncertainty in a pair-wise manner and effectively bypasses the confirmation bias problem of the reward model. Additionally, we also propose an uncertainty-enhanced self-evolution algorithm to improve the robustness of preference optimization and encourage the LLM to generate responses with both high reward and certainty. Extensive experiments over multiple benchmarks demonstrate that our framework substantially alleviates the noisy problem and improves the performance of iterative preference optimization.
Abstract:With the growth of the economy and society, enterprises, especially in the FinTech industry, have increasing demands of outbound calls for customers such as debt collection, marketing, anti-fraud calls, and so on. But a large amount of repetitive and mechanical work occupies most of the time of human agents, so the cost of equipment and labor for enterprises is increasing accordingly. At the same time, with the development of artificial intelligence technology in the past few decades, it has become quite common for companies to use new technologies such as Big Data and artificial intelligence to empower outbound call businesses. The intelligent outbound robot is a typical application of the artificial intelligence technology in the field of outbound call businesses. It is mainly used to communicate with customers in order to accomplish a certain target. It has the characteristics of low cost, high reuse, and easy compliance, which has attracted more attention from the industry. At present, there are two kinds of intelligent outbound robots in the industry but both of them still leave large room for improvement. One kind of them is based on a finite state machine relying on the configuration of jump conditions and corresponding nodes based on manual experience. This kind of intelligent outbound robot is also called a flow-based robot. For example, the schematic diagram of the working model of a flow-based robot for debt collection is shown in Fig.\ref{fig:label}. In each round, the robot will reply to the user with the words corresponding to each node.