Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yuli Slavutsky

CONTESTS: a Framework for Consistency Testing of Span Probabilities in Language Models

Sep 30, 2024

Eitan Wagner, Yuli Slavutsky, Omri Abend

Figure 1 for CONTESTS: a Framework for Consistency Testing of Span Probabilities in Language Models

Figure 2 for CONTESTS: a Framework for Consistency Testing of Span Probabilities in Language Models

Figure 3 for CONTESTS: a Framework for Consistency Testing of Span Probabilities in Language Models

Figure 4 for CONTESTS: a Framework for Consistency Testing of Span Probabilities in Language Models

Abstract:Although language model scores are often treated as probabilities, their reliability as probability estimators has mainly been studied through calibration, overlooking other aspects. In particular, it is unclear whether language models produce the same value for different ways of assigning joint probabilities to word spans. Our work introduces a novel framework, ConTestS (Consistency Testing over Spans), involving statistical tests to assess score consistency across interchangeable completion and conditioning orders. We conduct experiments on post-release real and synthetic data to eliminate training effects. Our findings reveal that both Masked Language Models (MLMs) and autoregressive models exhibit inconsistent predictions, with autoregressive models showing larger discrepancies. Larger MLMs tend to produce more consistent predictions, while autoregressive models show the opposite trend. Moreover, for both model types, prediction entropies offer insights into the true word span likelihood and therefore can aid in selecting optimal decoding strategies. The inconsistencies revealed by our analysis, as well their connection to prediction entropies and differences between model types, can serve as useful guides for future research on addressing these limitations.

Via

Access Paper or Ask Questions

Class Distribution Shifts in Zero-Shot Learning: Learning Robust Representations

Nov 30, 2023

Yuli Slavutsky, Yuval Benjamini

Abstract:Distribution shifts between training and deployment data often affect the performance of machine learning models. In this paper, we explore a setting where a hidden variable induces a shift in the distribution of classes. These distribution shifts are particularly challenging for zero-shot classifiers, as they rely on representations learned from training classes, but are deployed on new, unseen ones. We introduce an algorithm to learn data representations that are robust to such class distribution shifts in zero-shot verification tasks. We show that our approach, which combines hierarchical data sampling with out-of-distribution generalization techniques, improves generalization to diverse class distributions in both simulations and real-world datasets.

Via

Access Paper or Ask Questions

Predicting Classification Accuracy when Adding New Unobserved Classes

Oct 28, 2020

Yuli Slavutsky, Yuval Benjamini

Figure 1 for Predicting Classification Accuracy when Adding New Unobserved Classes

Figure 2 for Predicting Classification Accuracy when Adding New Unobserved Classes

Figure 3 for Predicting Classification Accuracy when Adding New Unobserved Classes

Figure 4 for Predicting Classification Accuracy when Adding New Unobserved Classes

Abstract:Multiclass classifiers are often designed and evaluated only on a sample from the classes on which they will eventually be applied. Hence, their final accuracy remains unknown. In this work we study how a classifier's performance over the initial class sample can be used to extrapolate its expected accuracy on a larger, unobserved set of classes. For this, we define a measure of separation between correct and incorrect classes that is independent of the number of classes: the reversed ROC (rROC), which is obtained by replacing the roles of classes and data-points in the common ROC. We show that the classification accuracy is a function of the rROC in multiclass classifiers, for which the learned representation of data from the initial class sample remains unchanged when new classes are added. Using these results we formulate a robust neural-network-based algorithm, CleaneX, which learns to estimate the accuracy of such classifiers on arbitrarily large sets of classes. Our method achieves remarkably better predictions than current state-of-the-art methods on both simulations and real datasets of object detection, face recognition, and brain decoding.

Via

Access Paper or Ask Questions