Abstract:Large language models (LLMs) often make factually incorrect responses despite their success in various applications. In this paper, we hypothesize that relying heavily on simple co-occurrence statistics of the pre-training corpora is one of the main factors that cause factual errors. Our results reveal that LLMs are vulnerable to the co-occurrence bias, defined as preferring frequently co-occurred words over the correct answer. Consequently, LLMs struggle to recall facts whose subject and object rarely co-occur in the pre-training dataset although they are seen during finetuning. We show that co-occurrence bias remains despite scaling up model sizes or finetuning. Therefore, we suggest finetuning on a debiased dataset to mitigate the bias by filtering out biased samples whose subject-object co-occurrence count is high. Although debiased finetuning allows LLMs to memorize rare facts in the training set, it is not effective in recalling rare facts unseen during finetuning. Further research in mitigation will help build reliable language models by preventing potential errors. The code is available at \url{https://github.com/CheongWoong/impact_of_cooccurrence}.
Abstract:Many contextualized word representations are now learned by intricate neural network models, such as masked neural language models (MNLMs) which are made up of huge neural network structures and trained to restore the masked text. Such representations demonstrate superhuman performance in some reading comprehension (RC) tasks which extract a proper answer in the context given a question. However, identifying the detailed knowledge trained in MNLMs is challenging owing to numerous and intermingled model parameters. This paper provides new insights and empirical analyses on commonsense knowledge included in pretrained MNLMs. First, we use a diagnostic test that evaluates whether commonsense knowledge is properly trained in MNLMs. We observe that a large proportion of commonsense knowledge is not appropriately trained in MNLMs and MNLMs do not often understand the semantic meaning of relations accurately. In addition, we find that the MNLM-based RC models are still vulnerable to semantic variations that require commonsense knowledge. Finally, we discover the fundamental reason why some knowledge is not trained. We further suggest that utilizing an external commonsense knowledge repository can be an effective solution. We exemplify the possibility to overcome the limitations of the MNLM-based RC models by enriching text with the required knowledge from an external commonsense knowledge repository in controlled experiments.
Abstract:Currently, contextualized word representations are learned by intricate neural network models, such as masked neural language models (MNLMs). The new representations significantly enhanced the performance in automated question answering by reading paragraphs. However, identifying the detailed knowledge trained in the MNLMs is difficult owing to numerous and intermingled parameters. This paper provides empirical but insightful analyses on the pretrained MNLMs with respect to common sense knowledge. First, we propose a test that measures what types of common sense knowledge do pretrained MNLMs understand. From the test, we observed that MNLMs partially understand various types of common sense knowledge but do not accurately understand the semantic meaning of relations. In addition, based on the difficulty of the question-answering task problems, we observed that pretrained MLM-based models are still vulnerable to problems that require common sense knowledge. We also experimentally demonstrated that we can elevate existing MNLM-based models by combining knowledge from an external common sense repository.