Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xiaoli Z. Fern

Text Counterfactuals via Latent Optimization and Shapley-Guided Search

Oct 22, 2021

Quintin Pope, Xiaoli Z. Fern

Figure 1 for Text Counterfactuals via Latent Optimization and Shapley-Guided Search

Figure 2 for Text Counterfactuals via Latent Optimization and Shapley-Guided Search

Figure 3 for Text Counterfactuals via Latent Optimization and Shapley-Guided Search

Figure 4 for Text Counterfactuals via Latent Optimization and Shapley-Guided Search

Abstract:We study the problem of generating counterfactual text for a classifier as a means for understanding and debugging classification. Given a textual input and a classification model, we aim to minimally alter the text to change the model's prediction. White-box approaches have been successfully applied to similar problems in vision where one can directly optimize the continuous input. Optimization-based approaches become difficult in the language domain due to the discrete nature of text. We bypass this issue by directly optimizing in the latent space and leveraging a language model to generate candidate modifications from optimized latent representations. We additionally use Shapley values to estimate the combinatoric effect of multiple changes. We then use these estimates to guide a beam search for the final counterfactual text. We achieve favorable performance compared to recent white-box and black-box baselines using human and automatic evaluations. Ablation studies show that both latent optimization and the use of Shapley values improve success rate and the quality of the generated counterfactuals.

* 9 pages, 2 figures, 3 tables. Accepted at EMNLP 2021

Via

Access Paper or Ask Questions

The Devils in the Point Clouds: Studying the Robustness of Point Cloud Convolutions

Jan 28, 2021

Xingyi Li, Wenxuan Wu, Xiaoli Z. Fern, Li Fuxin

Figure 1 for The Devils in the Point Clouds: Studying the Robustness of Point Cloud Convolutions

Figure 2 for The Devils in the Point Clouds: Studying the Robustness of Point Cloud Convolutions

Figure 3 for The Devils in the Point Clouds: Studying the Robustness of Point Cloud Convolutions

Figure 4 for The Devils in the Point Clouds: Studying the Robustness of Point Cloud Convolutions

Abstract:Recently, there has been a significant interest in performing convolution over irregularly sampled point clouds. Since point clouds are very different from regular raster images, it is imperative to study the generalization of the convolution networks more closely, especially their robustness under variations in scale and rotations of the input data. This paper investigates different variants of PointConv, a convolution network on point clouds, to examine their robustness to input scale and rotation changes. Of the variants we explored, two are novel and generated significant improvements. The first is replacing the multilayer perceptron based weight function with much simpler third degree polynomials, together with a Sobolev norm regularization. Secondly, for 3D datasets, we derive a novel viewpoint-invariant descriptor by utilizing 3D geometric properties as the input to PointConv, in addition to the regular 3D coordinates. We have also explored choices of activation functions, neighborhood, and subsampling methods. Experiments are conducted on the 2D MNIST & CIFAR-10 datasets as well as the 3D SemanticKITTI & ScanNet datasets. Results reveal that on 2D, using third degree polynomials greatly improves PointConv's robustness to scale changes and rotations, even surpassing traditional 2D CNNs for the MNIST dataset. On 3D datasets, the novel viewpoint-invariant descriptor significantly improves the performance as well as robustness of PointConv. We achieve the state-of-the-art semantic segmentation performance on the SemanticKITTI dataset, as well as comparable performance with the current highest framework on the ScanNet dataset among point-based approaches.

Via

Access Paper or Ask Questions

Entity-aware ELMo: Learning Contextual Entity Representation for Entity Disambiguation

Aug 22, 2019

Hamed Shahbazi, Xiaoli Z. Fern, Reza Ghaeini, Rasha Obeidat, Prasad Tadepalli

Figure 1 for Entity-aware ELMo: Learning Contextual Entity Representation for Entity Disambiguation

Figure 2 for Entity-aware ELMo: Learning Contextual Entity Representation for Entity Disambiguation

Figure 3 for Entity-aware ELMo: Learning Contextual Entity Representation for Entity Disambiguation

Figure 4 for Entity-aware ELMo: Learning Contextual Entity Representation for Entity Disambiguation

Abstract:We present a new local entity disambiguation system. The key to our system is a novel approach for learning entity representations. In our approach we learn an entity aware extension of Embedding for Language Model (ELMo) which we call Entity-ELMo (E-ELMo). Given a paragraph containing one or more named entity mentions, each mention is first defined as a function of the entire paragraph (including other mentions), then they predict the referent entities. Utilizing E-ELMo for local entity disambiguation, we outperform all of the state-of-the-art local and global models on the popular benchmarks by improving about 0.5\% on micro average accuracy for AIDA test-b with Yago candidate set. The evaluation setup of the training data and candidate set are the same as our baselines for fair comparison.

Via

Access Paper or Ask Questions

Saliency Learning: Teaching the Model Where to Pay Attention

Apr 04, 2019

Reza Ghaeini, Xiaoli Z. Fern, Hamed Shahbazi, Prasad Tadepalli

Figure 1 for Saliency Learning: Teaching the Model Where to Pay Attention

Figure 2 for Saliency Learning: Teaching the Model Where to Pay Attention

Figure 3 for Saliency Learning: Teaching the Model Where to Pay Attention

Figure 4 for Saliency Learning: Teaching the Model Where to Pay Attention

Abstract:Deep learning has emerged as a compelling solution to many NLP tasks with remarkable performances. However, due to their opacity, such models are hard to interpret and trust. Recent work on explaining deep models has introduced approaches to provide insights toward the model's behaviour and predictions, which are helpful for assessing the reliability of the model's predictions. However, such methods do not improve the model's reliability. In this paper, we aim to teach the model to make the right prediction for the right reason by providing explanation training and ensuring the alignment of the model's explanation with the ground truth explanation. Our experimental results on multiple tasks and datasets demonstrate the effectiveness of the proposed method, which produces more reliable predictions while delivering better results compared to traditionally trained models.

* Accepted as a short paper at NAACL 2019. 10 pages, 2 figures, 6 tables

Via

Access Paper or Ask Questions

Attentional Multi-Reading Sarcasm Detection

Sep 09, 2018

Reza Ghaeini, Xiaoli Z. Fern, Prasad Tadepalli

Figure 1 for Attentional Multi-Reading Sarcasm Detection

Figure 2 for Attentional Multi-Reading Sarcasm Detection

Figure 3 for Attentional Multi-Reading Sarcasm Detection

Figure 4 for Attentional Multi-Reading Sarcasm Detection

Abstract:Recognizing sarcasm often requires a deep understanding of multiple sources of information, including the utterance, the conversational context, and real world facts. Most of the current sarcasm detection systems consider only the utterance in isolation. There are some limited attempts toward taking into account the conversational context. In this paper, we propose an interpretable end-to-end model that combines information from both the utterance and the conversational context to detect sarcasm, and demonstrate its effectiveness through empirical evaluations. We also study the behavior of the proposed model to provide explanations for the model's decisions. Importantly, our model is capable of determining the impact of utterance and conversational context on the model's decisions. Finally, we provide an ablation study to illustrate the impact of different components of the proposed model.

Via

Access Paper or Ask Questions

Interpreting Recurrent and Attention-Based Neural Models: a Case Study on Natural Language Inference

Aug 12, 2018

Reza Ghaeini, Xiaoli Z. Fern, Prasad Tadepalli

Figure 1 for Interpreting Recurrent and Attention-Based Neural Models: a Case Study on Natural Language Inference

Abstract:Deep learning models have achieved remarkable success in natural language inference (NLI) tasks. While these models are widely explored, they are hard to interpret and it is often unclear how and why they actually work. In this paper, we take a step toward explaining such deep learning based models through a case study on a popular neural model for NLI. In particular, we propose to interpret the intermediate layers of NLI models by visualizing the saliency of attention and LSTM gating signals. We present several examples for which our methods are able to reveal interesting insights and identify the critical information contributing to the model decisions.

* 11 pages, 11 figures, accepted as a short paper at EMNLP 2018

Via

Access Paper or Ask Questions

Joint Neural Entity Disambiguation with Output Space Search

Jun 19, 2018

Hamed Shahbazi, Xiaoli Z. Fern, Reza Ghaeini, Chao Ma, Rasha Obeidat, Prasad Tadepalli

Figure 1 for Joint Neural Entity Disambiguation with Output Space Search

Figure 2 for Joint Neural Entity Disambiguation with Output Space Search

Figure 3 for Joint Neural Entity Disambiguation with Output Space Search

Figure 4 for Joint Neural Entity Disambiguation with Output Space Search

Abstract:In this paper, we present a novel model for entity disambiguation that combines both local contextual information and global evidences through Limited Discrepancy Search (LDS). Given an input document, we start from a complete solution constructed by a local model and conduct a search in the space of possible corrections to improve the local solution from a global view point. Our search utilizes a heuristic function to focus more on the least confident local decisions and a pruning function to score the global solutions based on their local fitness and the global coherences among the predicted entities. Experimental results on CoNLL 2003 and TAC 2010 benchmarks verify the effectiveness of our model.

* Accepted as a long paper at COLING 2018, 11 pages

Via

Access Paper or Ask Questions

Dependent Gated Reading for Cloze-Style Question Answering

Jun 01, 2018

Reza Ghaeini, Xiaoli Z. Fern, Hamed Shahbazi, Prasad Tadepalli

Figure 1 for Dependent Gated Reading for Cloze-Style Question Answering

Figure 2 for Dependent Gated Reading for Cloze-Style Question Answering

Figure 3 for Dependent Gated Reading for Cloze-Style Question Answering

Figure 4 for Dependent Gated Reading for Cloze-Style Question Answering

Abstract:We present a novel deep learning architecture to address the cloze-style question answering task. Existing approaches employ reading mechanisms that do not fully exploit the interdependency between the document and the query. In this paper, we propose a novel \emph{dependent gated reading} bidirectional GRU network (DGR) to efficiently model the relationship between the document and the query during encoding and decision making. Our evaluation shows that DGR obtains highly competitive performance on well-known machine comprehension benchmarks such as the Children's Book Test (CBT-NE and CBT-CN) and Who DiD What (WDW, Strict and Relaxed). Finally, we extensively analyze and validate our model by ablation and attention studies.

* Accepted as a long paper at COLING 2018, 16 pages, 12 figures

Via

Access Paper or Ask Questions

DR-BiLSTM: Dependent Reading Bidirectional LSTM for Natural Language Inference

Apr 11, 2018

Reza Ghaeini, Sadid A. Hasan, Vivek Datla, Joey Liu, Kathy Lee, Ashequl Qadir, Yuan Ling, Aaditya Prakash, Xiaoli Z. Fern, Oladimeji Farri

Figure 1 for DR-BiLSTM: Dependent Reading Bidirectional LSTM for Natural Language Inference

Figure 2 for DR-BiLSTM: Dependent Reading Bidirectional LSTM for Natural Language Inference

Figure 3 for DR-BiLSTM: Dependent Reading Bidirectional LSTM for Natural Language Inference

Figure 4 for DR-BiLSTM: Dependent Reading Bidirectional LSTM for Natural Language Inference

Abstract:We present a novel deep learning architecture to address the natural language inference (NLI) task. Existing approaches mostly rely on simple reading mechanisms for independent encoding of the premise and hypothesis. Instead, we propose a novel dependent reading bidirectional LSTM network (DR-BiLSTM) to efficiently model the relationship between a premise and a hypothesis during encoding and inference. We also introduce a sophisticated ensemble strategy to combine our proposed models, which noticeably improves final predictions. Finally, we demonstrate how the results can be improved further with an additional preprocessing step. Our evaluation shows that DR-BiLSTM obtains the best single model and ensemble model results achieving the new state-of-the-art scores on the Stanford NLI dataset.

* 18 pages, Accepted as a long paper at NAACL HLT 2018

Via

Access Paper or Ask Questions

Event Nugget Detection with Forward-Backward Recurrent Neural Networks

Feb 15, 2018

Reza Ghaeini, Xiaoli Z. Fern, Liang Huang, Prasad Tadepalli

Figure 1 for Event Nugget Detection with Forward-Backward Recurrent Neural Networks

Figure 2 for Event Nugget Detection with Forward-Backward Recurrent Neural Networks

Figure 3 for Event Nugget Detection with Forward-Backward Recurrent Neural Networks

Figure 4 for Event Nugget Detection with Forward-Backward Recurrent Neural Networks

Abstract:Traditional event detection methods heavily rely on manually engineered rich features. Recent deep learning approaches alleviate this problem by automatic feature engineering. But such efforts, like tradition methods, have so far only focused on single-token event mentions, whereas in practice events can also be a phrase. We instead use forward-backward recurrent neural networks (FBRNNs) to detect events that can be either words or phrases. To the best our knowledge, this is one of the first efforts to handle multi-word events and also the first attempt to use RNNs for event detection. Experimental results demonstrate that FBRNN is competitive with the state-of-the-art methods on the ACE 2005 and the Rich ERE 2015 event detection tasks.

* http://www.aclweb.org/anthology/P16-2060
* Published as a short paper at ACL 2016. The main purpose of this submission is to add this paper to arxiv

Via

Access Paper or Ask Questions