Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Dongxu Zhang

Enhancing Hallucination Detection through Perturbation-Based Synthetic Data Generation in System Responses

Jul 07, 2024

Dongxu Zhang, Varun Gangal, Barrett Martin Lattimer, Yi Yang

Abstract:Detecting hallucinations in large language model (LLM) outputs is pivotal, yet traditional fine-tuning for this classification task is impeded by the expensive and quickly outdated annotation process, especially across numerous vertical domains and in the face of rapid LLM advancements. In this study, we introduce an approach that automatically generates both faithful and hallucinated outputs by rewriting system responses. Experimental findings demonstrate that a T5-base model, fine-tuned on our generated dataset, surpasses state-of-the-art zero-shot detectors and existing synthetic generation methods in both accuracy and latency, indicating efficacy of our approach.

* ACL 2024 findings

Via

Access Paper or Ask Questions

A Distant Supervision Corpus for Extracting Biomedical Relationships Between Chemicals, Diseases and Genes

Apr 13, 2022

Dongxu Zhang, Sunil Mohan, Michaela Torkar, Andrew McCallum

Figure 1 for A Distant Supervision Corpus for Extracting Biomedical Relationships Between Chemicals, Diseases and Genes

Figure 2 for A Distant Supervision Corpus for Extracting Biomedical Relationships Between Chemicals, Diseases and Genes

Figure 3 for A Distant Supervision Corpus for Extracting Biomedical Relationships Between Chemicals, Diseases and Genes

Figure 4 for A Distant Supervision Corpus for Extracting Biomedical Relationships Between Chemicals, Diseases and Genes

Abstract:We introduce ChemDisGene, a new dataset for training and evaluating multi-class multi-label document-level biomedical relation extraction models. Our dataset contains 80k biomedical research abstracts labeled with mentions of chemicals, diseases, and genes, portions of which human experts labeled with 18 types of biomedical relationships between these entities (intended for evaluation), and the remainder of which (intended for training) has been distantly labeled via the CTD database with approximately 78\% accuracy. In comparison to similar preexisting datasets, ours is both substantially larger and cleaner; it also includes annotations linking mentions to their entities. We also provide three baseline deep neural network relation extraction models trained and evaluated on our new dataset.

* LREC 2022 (Oral)

Via

Access Paper or Ask Questions

Improving Local Identifiability in Probabilistic Box Embeddings

Oct 29, 2020

Shib Sankar Dasgupta, Michael Boratko, Dongxu Zhang, Luke Vilnis, Xiang Lorraine Li, Andrew McCallum

Figure 1 for Improving Local Identifiability in Probabilistic Box Embeddings

Figure 2 for Improving Local Identifiability in Probabilistic Box Embeddings

Figure 3 for Improving Local Identifiability in Probabilistic Box Embeddings

Figure 4 for Improving Local Identifiability in Probabilistic Box Embeddings

Abstract:Geometric embeddings have recently received attention for their natural ability to represent transitive asymmetric relations via containment. Box embeddings, where objects are represented by n-dimensional hyperrectangles, are a particularly promising example of such an embedding as they are closed under intersection and their volume can be calculated easily, allowing them to naturally represent calibrated probability distributions. The benefits of geometric embeddings also introduce a problem of local identifiability, however, where whole neighborhoods of parameters result in equivalent loss which impedes learning. Prior work addressed some of these issues by using an approximation to Gaussian convolution over the box parameters, however, this intersection operation also increases the sparsity of the gradient. In this work, we model the box parameters with min and max Gumbel distributions, which were chosen such that space is still closed under the operation of the intersection. The calculation of the expected intersection volume involves all parameters, and we demonstrate experimentally that this drastically improves the ability of such models to learn.

* Accepted at NeurIPS2020

Via

Access Paper or Ask Questions

OpenKI: Integrating Open Information Extraction and Knowledge Bases with Relation Inference

Apr 12, 2019

Dongxu Zhang, Subhabrata Mukherjee, Colin Lockard, Xin Luna Dong, Andrew McCallum

Figure 1 for OpenKI: Integrating Open Information Extraction and Knowledge Bases with Relation Inference

Figure 2 for OpenKI: Integrating Open Information Extraction and Knowledge Bases with Relation Inference

Figure 3 for OpenKI: Integrating Open Information Extraction and Knowledge Bases with Relation Inference

Figure 4 for OpenKI: Integrating Open Information Extraction and Knowledge Bases with Relation Inference

Abstract:In this paper, we consider advancing web-scale knowledge extraction and alignment by integrating OpenIE extractions in the form of (subject, predicate, object) triples with Knowledge Bases (KB). Traditional techniques from universal schema and from schema mapping fall in two extremes: either they perform instance-level inference relying on embedding for (subject, object) pairs, thus cannot handle pairs absent in any existing triples; or they perform predicate-level mapping and completely ignore background evidence from individual entities, thus cannot achieve satisfying quality. We propose OpenKI to handle sparsity of OpenIE extractions by performing instance-level inference: for each entity, we encode the rich information in its neighborhood in both KB and OpenIE extractions, and leverage this information in relation inference by exploring different methods of aggregation and attention. In order to handle unseen entities, our model is designed without creating entity-specific parameters. Extensive experiments show that this method not only significantly improves state-of-the-art for conventional OpenIE extractions like ReVerb, but also boosts the performance on OpenIE from semi-structured data, where new entity pairs are abundant and data are fairly sparse.

Via

Access Paper or Ask Questions

Search-Guided, Lightly-supervised Training of Structured Prediction Energy Networks

Dec 22, 2018

Amirmohammad Rooshenas, Dongxu Zhang, Gopal Sharma, Andrew McCallum

Figure 1 for Search-Guided, Lightly-supervised Training of Structured Prediction Energy Networks

Figure 2 for Search-Guided, Lightly-supervised Training of Structured Prediction Energy Networks

Figure 3 for Search-Guided, Lightly-supervised Training of Structured Prediction Energy Networks

Figure 4 for Search-Guided, Lightly-supervised Training of Structured Prediction Energy Networks

Abstract:In structured output prediction tasks, labeling ground-truth training output is often expensive. However, for many tasks, even when the true output is unknown, we can evaluate predictions using a scalar reward function, which may be easily assembled from human knowledge or non-differentiable pipelines. But searching through the entire output space to find the best output with respect to this reward function is typically intractable. In this paper, we instead use efficient truncated randomized search in this reward function to train structured prediction energy networks (SPENs), which provide efficient test-time inference using gradient-based search on a smooth, learned representation of the score landscape, and have previously yielded state-of-the-art results in structured prediction. In particular, this truncated randomized search in the reward function yields previously unknown local improvements, providing effective supervision to SPENs, avoiding their traditional need for labeled training data.

Via

Access Paper or Ask Questions

Word Embedding Perturbation for Sentence Classification

Apr 22, 2018

Dongxu Zhang, Zhichao Yang

Figure 1 for Word Embedding Perturbation for Sentence Classification

Figure 2 for Word Embedding Perturbation for Sentence Classification

Figure 3 for Word Embedding Perturbation for Sentence Classification

Abstract:In this technique report, we aim to mitigate the overfitting problem of natural language by applying data augmentation methods. Specifically, we attempt several types of noise to perturb the input word embedding, such as Gaussian noise, Bernoulli noise, and adversarial noise, etc. We also apply several constraints on different types of noise. By implementing these proposed data augmentation methods, the baseline models can gain improvements on several sentence classification tasks.

Via

Access Paper or Ask Questions

Relation Classification via Recurrent Neural Network

Dec 25, 2015

Dongxu Zhang, Dong Wang

Figure 1 for Relation Classification via Recurrent Neural Network

Figure 2 for Relation Classification via Recurrent Neural Network

Figure 3 for Relation Classification via Recurrent Neural Network

Figure 4 for Relation Classification via Recurrent Neural Network

Abstract:Deep learning has gained much success in sentence-level relation classification. For example, convolutional neural networks (CNN) have delivered competitive performance without much effort on feature engineering as the conventional pattern-based methods. Thus a lot of works have been produced based on CNN structures. However, a key issue that has not been well addressed by the CNN-based method is the lack of capability to learn temporal features, especially long-distance dependency between nominal pairs. In this paper, we propose a simple framework based on recurrent neural networks (RNN) and compare it with CNN-based model. To show the limitation of popular used SemEval-2010 Task 8 dataset, we introduce another dataset refined from MIMLRE(Angeli et al., 2014). Experiments on two different datasets strongly indicates that the RNN-based model can deliver better performance on relation classification, and it is particularly capable of learning long-distance relation patterns. This makes it suitable for real-world applications where complicated expressions are often involved.

Via

Access Paper or Ask Questions

Learning from LDA using Deep Neural Networks

Aug 05, 2015

Dongxu Zhang, Tianyi Luo, Dong Wang, Rong Liu

Figure 1 for Learning from LDA using Deep Neural Networks

Abstract:Latent Dirichlet Allocation (LDA) is a three-level hierarchical Bayesian model for topic inference. In spite of its great success, inferring the latent topic distribution with LDA is time-consuming. Motivated by the transfer learning approach proposed by~\newcite{hinton2015distilling}, we present a novel method that uses LDA to supervise the training of a deep neural network (DNN), so that the DNN can approximate the costly LDA inference with less computation. Our experiments on a document classification task show that a simple DNN can learn the LDA behavior pretty well, while the inference is speeded up tens or hundreds of times.

Via

Access Paper or Ask Questions