Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Chong Meng

Improving the Robustness of Large Language Models via Consistency Alignment

Mar 22, 2024

Yukun Zhao, Lingyong Yan, Weiwei Sun, Guoliang Xing, Shuaiqiang Wang, Chong Meng, Zhicong Cheng, Zhaochun Ren, Dawei Yin

Figure 1 for Improving the Robustness of Large Language Models via Consistency Alignment

Figure 2 for Improving the Robustness of Large Language Models via Consistency Alignment

Figure 3 for Improving the Robustness of Large Language Models via Consistency Alignment

Figure 4 for Improving the Robustness of Large Language Models via Consistency Alignment

Abstract:Large language models (LLMs) have shown tremendous success in following user instructions and generating helpful responses. Nevertheless, their robustness is still far from optimal, as they may generate significantly inconsistent responses due to minor changes in the verbalized instructions. Recent literature has explored this inconsistency issue, highlighting the importance of continued improvement in the robustness of response generation. However, systematic analysis and solutions are still lacking. In this paper, we quantitatively define the inconsistency problem and propose a two-stage training framework consisting of instruction-augmented supervised fine-tuning and consistency alignment training. The first stage helps a model generalize on following instructions via similar instruction augmentations. In the second stage, we improve the diversity and help the model understand which responses are more aligned with human expectations by differentiating subtle differences in similar responses. The training process is accomplished by self-rewards inferred from the trained model at the first stage without referring to external human preference resources. We conduct extensive experiments on recent publicly available LLMs on instruction-following tasks and demonstrate the effectiveness of our training framework.

* Accepted by LREC-COLING 2024

Via

Access Paper or Ask Questions

Knowing What LLMs DO NOT Know: A Simple Yet Effective Self-Detection Method

Oct 27, 2023

Yukun Zhao, Lingyong Yan, Weiwei Sun, Guoliang Xing, Chong Meng, Shuaiqiang Wang, Zhicong Cheng, Zhaochun Ren, Dawei Yin

Abstract:Large Language Models (LLMs) have shown great potential in Natural Language Processing (NLP) tasks. However, recent literature reveals that LLMs generate nonfactual responses intermittently, which impedes the LLMs' reliability for further utilization. In this paper, we propose a novel self-detection method to detect which questions that a LLM does not know that are prone to generate nonfactual results. Specifically, we first diversify the textual expressions for a given question and collect the corresponding answers. Then we examine the divergencies between the generated answers to identify the questions that the model may generate falsehoods. All of the above steps can be accomplished by prompting the LLMs themselves without referring to any other external resources. We conduct comprehensive experiments and demonstrate the effectiveness of our method on recently released LLMs, e.g., Vicuna, ChatGPT, and GPT-4.

Via

Access Paper or Ask Questions

DiQAD: A Benchmark Dataset for End-to-End Open-domain Dialogue Assessment

Oct 25, 2023

Yukun Zhao, Lingyong Yan, Weiwei Sun, Chong Meng, Shuaiqiang Wang, Zhicong Cheng, Zhaochun Ren, Dawei Yin

Figure 1 for DiQAD: A Benchmark Dataset for End-to-End Open-domain Dialogue Assessment

Figure 2 for DiQAD: A Benchmark Dataset for End-to-End Open-domain Dialogue Assessment

Figure 3 for DiQAD: A Benchmark Dataset for End-to-End Open-domain Dialogue Assessment

Figure 4 for DiQAD: A Benchmark Dataset for End-to-End Open-domain Dialogue Assessment

Abstract:Dialogue assessment plays a critical role in the development of open-domain dialogue systems. Existing work are uncapable of providing an end-to-end and human-epistemic assessment dataset, while they only provide sub-metrics like coherence or the dialogues are conversed between annotators far from real user settings. In this paper, we release a large-scale dialogue quality assessment dataset (DiQAD), for automatically assessing open-domain dialogue quality. Specifically, we (1) establish the assessment criteria based on the dimensions conforming to human judgements on dialogue qualities, and (2) annotate large-scale dialogues that conversed between real users based on these annotation criteria, which contains around 100,000 dialogues. We conduct several experiments and report the performances of the baselines as the benchmark on DiQAD. The dataset is openly accessible at https://github.com/yukunZhao/Dataset_Dialogue_quality_evaluation.

* Accepted to Findings of EMNLP 2023

Via

Access Paper or Ask Questions

Exposing Length Divergence Bias of Textual Matching Models

Sep 06, 2021

Lan Jiang, Tianshu Lyu, Chong Meng, Xiaoyong Lyu, Dawei Yin

Figure 1 for Exposing Length Divergence Bias of Textual Matching Models

Figure 2 for Exposing Length Divergence Bias of Textual Matching Models

Figure 3 for Exposing Length Divergence Bias of Textual Matching Models

Figure 4 for Exposing Length Divergence Bias of Textual Matching Models

Abstract:Despite the remarkable success deep models have achieved in Textual Matching (TM), their robustness issue is still a topic of concern. In this work, we propose a new perspective to study this issue -- via the length divergence bias of TM models. We conclude that this bias stems from two parts: the label bias of existing TM datasets and the sensitivity of TM models to superficial information. We critically examine widely used TM datasets, and find that all of them follow specific length divergence distributions by labels, providing direct cues for predictions. As for the TM models, we conduct adversarial evaluation and show that all models' performances drop on the out-of-distribution adversarial test sets we construct, which demonstrates that they are all misled by biased training sets. This is also confirmed by the \textit{SentLen} probing task that all models capture rich length information during training to facilitate their performances. Finally, to alleviate the length divergence bias in TM models, we propose a practical adversarial training method using bias-free training data. Our experiments indicate that we successfully improve the robustness and generalization ability of models at the same time.

Via

Access Paper or Ask Questions