Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Polish Natural Language Inference and Factivity -- an Expert-based Dataset and Benchmarks

Jan 10, 2022

Daniel Ziembicki, Anna Wróblewska, Karolina Seweryn

Figure 1 for Polish Natural Language Inference and Factivity -- an Expert-based Dataset and Benchmarks

Figure 2 for Polish Natural Language Inference and Factivity -- an Expert-based Dataset and Benchmarks

Figure 3 for Polish Natural Language Inference and Factivity -- an Expert-based Dataset and Benchmarks

Figure 4 for Polish Natural Language Inference and Factivity -- an Expert-based Dataset and Benchmarks

Share this with someone who'll enjoy it:

Abstract:Despite recent breakthroughs in Machine Learning for Natural Language Processing, the Natural Language Inference (NLI) problems still constitute a challenge. To this purpose we contribute a new dataset that focuses exclusively on the factivity phenomenon; however, our task remains the same as other NLI tasks, i.e. prediction of entailment, contradiction or neutral (ECN). The dataset contains entirely natural language utterances in Polish and gathers 2,432 verb-complement pairs and 309 unique verbs. The dataset is based on the National Corpus of Polish (NKJP) and is a representative sample in regards to frequency of main verbs and other linguistic features (e.g. occurrence of internal negation). We found that transformer BERT-based models working on sentences obtained relatively good results ($\approx89\%$ F1 score). Even though better results were achieved using linguistic features ($\approx91\%$ F1 score), this model requires more human labour (humans in the loop) because features were prepared manually by expert linguists. BERT-based models consuming only the input sentences show that they capture most of the complexity of NLI/factivity. Complex cases in the phenomenon - e.g. cases with entitlement (E) and non-factive verbs - remain an open issue for further research.

View paper on

Share this with someone who'll enjoy it:

Title:Polish Natural Language Inference and Factivity -- an Expert-based Dataset and Benchmarks

Paper and Code