Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Georgi Georgiev

OpenFactCheck: A Unified Framework for Factuality Evaluation of LLMs

May 09, 2024

Yuxia Wang, Minghan Wang, Hasan Iqbal, Georgi Georgiev, Jiahui Geng, Preslav Nakov

Abstract:The increased use of large language models (LLMs) across a variety of real-world applications calls for mechanisms to verify the factual accuracy of their outputs. Difficulties lie in assessing the factuality of free-form responses in open domains. Also, different papers use disparate evaluation benchmarks and measurements, which renders them hard to compare and hampers future progress. To mitigate these issues, we propose OpenFactCheck, a unified factuality evaluation framework for LLMs. OpenFactCheck consists of three modules: (i) CUSTCHECKER allows users to easily customize an automatic fact-checker and verify the factual correctness of documents and claims, (ii) LLMEVAL, a unified evaluation framework assesses LLM's factuality ability from various perspectives fairly, and (iii) CHECKEREVAL is an extensible solution for gauging the reliability of automatic fact-checkers' verification results using human-annotated datasets. OpenFactCheck is publicly released at https://github.com/yuxiaw/OpenFactCheck.

* 19 pages, 8 tables, 8 figures

Via

Access Paper or Ask Questions

Factuality of Large Language Models in the Year 2024

Feb 09, 2024

Yuxia Wang, Minghan Wang, Muhammad Arslan Manzoor, Fei Liu, Georgi Georgiev, Rocktim Jyoti Das, Preslav Nakov

Figure 1 for Factuality of Large Language Models in the Year 2024

Figure 2 for Factuality of Large Language Models in the Year 2024

Figure 3 for Factuality of Large Language Models in the Year 2024

Abstract:Large language models (LLMs), especially when instruction-tuned for chat, have become part of our daily lives, freeing people from the process of searching, extracting, and integrating information from multiple sources by offering a straightforward answer to a variety of questions in a single place. Unfortunately, in many cases, LLM responses are factually incorrect, which limits their applicability in real-world scenarios. As a result, research on evaluating and improving the factuality of LLMs has attracted a lot of research attention recently. In this survey, we critically analyze existing work with the aim to identify the major challenges and their associated causes, pointing out to potential solutions for improving the factuality of LLMs, and analyzing the obstacles to automated factuality evaluation for open-ended text generation. We further offer an outlook on where future research should go.

* 9 pages, 1 figure and 2 tables

Via

Access Paper or Ask Questions

Leaf: Multiple-Choice Question Generation

Jan 22, 2022

Kristiyan Vachev, Momchil Hardalov, Georgi Karadzhov, Georgi Georgiev, Ivan Koychev, Preslav Nakov

Figure 1 for Leaf: Multiple-Choice Question Generation

Figure 2 for Leaf: Multiple-Choice Question Generation

Abstract:Testing with quiz questions has proven to be an effective way to assess and improve the educational process. However, manually creating quizzes is tedious and time-consuming. To address this challenge, we present Leaf, a system for generating multiple-choice questions from factual text. In addition to being very well suited for the classroom, Leaf could also be used in an industrial setting, e.g., to facilitate onboarding and knowledge sharing, or as a component of chatbots, question answering systems, or Massive Open Online Courses (MOOCs). The code and the demo are available on https://github.com/KristiyanVachev/Leaf-Question-Generation.

* Accepted to ECIR 2022 (Demo)

Via

Access Paper or Ask Questions

Feature-Rich Named Entity Recognition for Bulgarian Using Conditional Random Fields

Sep 26, 2021

Georgi Georgiev, Preslav Nakov, Kuzman Ganchev, Petya Osenova, Kiril Ivanov Simov

Figure 1 for Feature-Rich Named Entity Recognition for Bulgarian Using Conditional Random Fields

Figure 2 for Feature-Rich Named Entity Recognition for Bulgarian Using Conditional Random Fields

Figure 3 for Feature-Rich Named Entity Recognition for Bulgarian Using Conditional Random Fields

Abstract:The paper presents a feature-rich approach to the automatic recognition and categorization of named entities (persons, organizations, locations, and miscellaneous) in news text for Bulgarian. We combine well-established features used for other languages with language-specific lexical, syntactic and morphological information. In particular, we make use of the rich tagset annotation of the BulTreeBank (680 morpho-syntactic tags), from which we derive suitable task-specific tagsets (local and nonlocal). We further add domain-specific gazetteers and additional unlabeled data, achieving F1=89.4%, which is comparable to the state-of-the-art results for English.

* RANLP-2009
* named entity recognition, NER, conditional random fields, CRF, Bulgarian, BulTreeBank

Via

Access Paper or Ask Questions

Exposing Paid Opinion Manipulation Trolls

Sep 26, 2021

Todor Mihaylov, Ivan Koychev, Georgi Georgiev, Preslav Nakov

Figure 1 for Exposing Paid Opinion Manipulation Trolls

Figure 2 for Exposing Paid Opinion Manipulation Trolls

Figure 3 for Exposing Paid Opinion Manipulation Trolls

Figure 4 for Exposing Paid Opinion Manipulation Trolls

Abstract:Recently, Web forums have been invaded by opinion manipulation trolls. Some trolls try to influence the other users driven by their own convictions, while in other cases they can be organized and paid, e.g., by a political party or a PR agency that gives them specific instructions what to write. Finding paid trolls automatically using machine learning is a hard task, as there is no enough training data to train a classifier; yet some test data is possible to obtain, as these trolls are sometimes caught and widely exposed. In this paper, we solve the training data problem by assuming that a user who is called a troll by several different people is likely to be such, and one who has never been called a troll is unlikely to be such. We compare the profiles of (i) paid trolls vs. (ii)"mentioned" trolls vs. (iii) non-trolls, and we further show that a classifier trained to distinguish (ii) from (iii) does quite well also at telling apart (i) from (iii).

* RANLP-2015
* opinion manipulation trolls, trolls, opinion manipulation, community forums, news media

Via

Access Paper or Ask Questions

Generating Answer Candidates for Quizzes and Answer-Aware Question Generators

Aug 29, 2021

Kristiyan Vachev, Momchil Hardalov, Georgi Karadzhov, Georgi Georgiev, Ivan Koychev, Preslav Nakov

Figure 1 for Generating Answer Candidates for Quizzes and Answer-Aware Question Generators

Figure 2 for Generating Answer Candidates for Quizzes and Answer-Aware Question Generators

Figure 3 for Generating Answer Candidates for Quizzes and Answer-Aware Question Generators

Abstract:In education, open-ended quiz questions have become an important tool for assessing the knowledge of students. Yet, manually preparing such questions is a tedious task, and thus automatic question generation has been proposed as a possible alternative. So far, the vast majority of research has focused on generating the question text, relying on question answering datasets with readily picked answers, and the problem of how to come up with answer candidates in the first place has been largely ignored. Here, we aim to bridge this gap. In particular, we propose a model that can generate a specified number of answer candidates for a given passage of text, which can then be used by instructors to write questions manually or can be passed as an input to automatic answer-aware question generators. Our experiments show that our proposed answer candidate generation model outperforms several baselines.

* RANLP-2021 (SRW)
* answer generation, question generation, answer-aware question generation, quiz questions, question answering

Via

Access Paper or Ask Questions

Feature-Rich Part-of-speech Tagging for Morphologically Complex Languages: Application to Bulgarian

Nov 26, 2019

Georgi Georgiev, Valentin Zhikov, Petya Osenova, Kiril Simov, Preslav Nakov

Figure 1 for Feature-Rich Part-of-speech Tagging for Morphologically Complex Languages: Application to Bulgarian

Figure 2 for Feature-Rich Part-of-speech Tagging for Morphologically Complex Languages: Application to Bulgarian

Figure 3 for Feature-Rich Part-of-speech Tagging for Morphologically Complex Languages: Application to Bulgarian

Figure 4 for Feature-Rich Part-of-speech Tagging for Morphologically Complex Languages: Application to Bulgarian

Abstract:We present experiments with part-of-speech tagging for Bulgarian, a Slavic language with rich inflectional and derivational morphology. Unlike most previous work, which has used a small number of grammatical categories, we work with 680 morpho-syntactic tags. We combine a large morphological lexicon with prior linguistic knowledge and guided learning from a POS-annotated corpus, achieving accuracy of 97.98%, which is a significant improvement over the state-of-the-art for Bulgarian.

* EACL-2012
* part-of-speech tagging, POS tagging, morpho-syntactic tags, guided learning, Bulgarian, Slavic

Via

Access Paper or Ask Questions

Where Classification Fails, Interpretation Rises

Dec 02, 2017

Chanh Nguyen, Georgi Georgiev, Yujie Ji, Ting Wang

Figure 1 for Where Classification Fails, Interpretation Rises

Figure 2 for Where Classification Fails, Interpretation Rises

Figure 3 for Where Classification Fails, Interpretation Rises

Figure 4 for Where Classification Fails, Interpretation Rises

Abstract:An intriguing property of deep neural networks is their inherent vulnerability to adversarial inputs, which significantly hinders their application in security-critical domains. Most existing detection methods attempt to use carefully engineered patterns to distinguish adversarial inputs from their genuine counterparts, which however can often be circumvented by adaptive adversaries. In this work, we take a completely different route by leveraging the definition of adversarial inputs: while deceiving for deep neural networks, they are barely discernible for human visions. Building upon recent advances in interpretable models, we construct a new detection framework that contrasts an input's interpretation against its classification. We validate the efficacy of this framework through extensive experiments using benchmark datasets and attacks. We believe that this work opens a new direction for designing adversarial input detection methods.

* 6 pages, 6 figures

Via

Access Paper or Ask Questions

The Case for Being Average: A Mediocrity Approach to Style Masking and Author Obfuscation

Jul 28, 2017

Georgi Karadjov, Tsvetomila Mihaylova, Yasen Kiprov, Georgi Georgiev, Ivan Koychev, Preslav Nakov

Figure 1 for The Case for Being Average: A Mediocrity Approach to Style Masking and Author Obfuscation

Figure 2 for The Case for Being Average: A Mediocrity Approach to Style Masking and Author Obfuscation

Figure 3 for The Case for Being Average: A Mediocrity Approach to Style Masking and Author Obfuscation

Abstract:Users posting online expect to remain anonymous unless they have logged in, which is often needed for them to be able to discuss freely on various topics. Preserving the anonymity of a text's writer can be also important in some other contexts, e.g., in the case of witness protection or anonymity programs. However, each person has his/her own style of writing, which can be analyzed using stylometry, and as a result, the true identity of the author of a piece of text can be revealed even if s/he has tried to hide it. Thus, it could be helpful to design automatic tools that can help a person obfuscate his/her identity when writing text. In particular, here we propose an approach that changes the text, so that it is pushed towards average values for some general stylometric characteristics, thus making the use of these characteristics less discriminative. The approach consists of three main steps: first, we calculate the values for some popular stylometric metrics that can indicate authorship; then we apply various transformations to the text, so that these metrics are adjusted towards the average level, while preserving the semantics and the soundness of the text; and finally, we add random noise. This approach turned out to be very efficient, and yielded the best performance on the Author Obfuscation task at the PAN-2016 competition.

* Best of the Labs Track at CLEF-2017

Via

Access Paper or Ask Questions