Abstract:The goal of a summary is to concisely state the most important information in a document. With this principle in mind, we introduce new reference-free summary evaluation metrics that use a pretrained language model to estimate the information shared between a document and its summary. These metrics are a modern take on the Shannon Game, a method for summary quality scoring proposed decades ago, where we replace human annotators with language models. We also view these metrics as an extension of BLANC, a recently proposed approach to summary quality measurement based on the performance of a language model with and without the help of a summary. Using GPT-2, we empirically verify that the introduced metrics correlate with human judgement based on coverage, overall quality, and five summary dimensions.
Abstract:The prevalence of ambiguous acronyms make scientific documents harder to understand for humans and machines alike, presenting a need for models that can automatically identify acronyms in text and disambiguate their meaning. We introduce new methods for acronym identification and disambiguation: our acronym identification model projects learned token embeddings onto tag predictions, and our acronym disambiguation model finds training examples with similar sentence embeddings as test examples. Both of our systems achieve significant performance gains over previously suggested methods, and perform competitively on the SDU@AAAI-21 shared task leaderboard. Our models were trained in part on new distantly-supervised datasets for these tasks which we call AuxAI and AuxAD. We also identified a duplication conflict issue in the SciAD dataset, and formed a deduplicated version of SciAD that we call SciAD-dedupe. We publicly released all three of these datasets, and hope that they help the community make further strides in scientific document understanding.
Abstract:We explore the sensitivity of a document summary quality estimator, BLANC, to human assessment of qualities for the same summaries. In our human evaluations, we distinguish five summary qualities, defined by how fluent, understandable, informative, compact, and factually correct the summary is. We make the case for optimal BLANC parameters, at which the BLANC sensitivity to almost all of summary qualities is about as good as the sensitivity of a human annotator.
Abstract:The Generator of a Generative Adversarial Network (GAN) is trained to transform latent vectors drawn from a prior distribution into realistic looking photos. These latent vectors have been shown to encode information about the content of their corresponding images. Projecting input images onto the latent space of a GAN is non-trivial, but previous work has successfully performed this task for latent spaces with a uniform prior. We extend these techniques to latent spaces with a Gaussian prior, and demonstrate our technique's effectiveness.