Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Genady Beryozkin

Using Text Injection to Improve Recognition of Personal Identifiers in Speech

Aug 14, 2023

Yochai Blau, Rohan Agrawal, Lior Madmony, Gary Wang, Andrew Rosenberg, Zhehuai Chen, Zorik Gekhman, Genady Beryozkin, Parisa Haghani, Bhuvana Ramabhadran

Figure 1 for Using Text Injection to Improve Recognition of Personal Identifiers in Speech

Figure 2 for Using Text Injection to Improve Recognition of Personal Identifiers in Speech

Figure 3 for Using Text Injection to Improve Recognition of Personal Identifiers in Speech

Figure 4 for Using Text Injection to Improve Recognition of Personal Identifiers in Speech

Abstract:Accurate recognition of specific categories, such as persons' names, dates or other identifiers is critical in many Automatic Speech Recognition (ASR) applications. As these categories represent personal information, ethical use of this data including collection, transcription, training and evaluation demands special care. One way of ensuring the security and privacy of individuals is to redact or eliminate Personally Identifiable Information (PII) from collection altogether. However, this results in ASR models that tend to have lower recognition accuracy of these categories. We use text-injection to improve the recognition of PII categories by including fake textual substitutes of PII categories in the training data using a text injection method. We demonstrate substantial improvement to Recall of Names and Dates in medical notes while improving overall WER. For alphanumeric digit sequences we show improvements to Character Error Rate and Sentence Accuracy.

* Accepted to Interspeech 2023

Via

Access Paper or Ask Questions

RED-ACE: Robust Error Detection for ASR using Confidence Embeddings

Mar 14, 2022

Zorik Gekhman, Dina Zverinski, Jonathan Mallinson, Genady Beryozkin

Figure 1 for RED-ACE: Robust Error Detection for ASR using Confidence Embeddings

Figure 2 for RED-ACE: Robust Error Detection for ASR using Confidence Embeddings

Figure 3 for RED-ACE: Robust Error Detection for ASR using Confidence Embeddings

Figure 4 for RED-ACE: Robust Error Detection for ASR using Confidence Embeddings

Abstract:ASR Error Detection (AED) models aim to post-process the output of Automatic Speech Recognition (ASR) systems, in order to detect transcription errors. Modern approaches usually use text-based input, comprised solely of the ASR transcription hypothesis, disregarding additional signals from the ASR model. Instead, we propose to utilize the ASR system's word-level confidence scores for improving AED performance. Specifically, we add an ASR Confidence Embedding (ACE) layer to the AED model's encoder, allowing us to jointly encode the confidence scores and the transcribed text into a contextualized representation. Our experiments show the benefits of ASR confidence scores for AED, their complementary effect over the textual signal, as well as the effectiveness and robustness of ACE for combining these signals. To foster further research, we publish a novel AED dataset consisting of ASR outputs on the LibriSpeech corpus with annotated transcription errors.

Via

Access Paper or Ask Questions

KoBE: Knowledge-Based Machine Translation Evaluation

Sep 23, 2020

Zorik Gekhman, Roee Aharoni, Genady Beryozkin, Markus Freitag, Wolfgang Macherey

Figure 1 for KoBE: Knowledge-Based Machine Translation Evaluation

Figure 2 for KoBE: Knowledge-Based Machine Translation Evaluation

Figure 3 for KoBE: Knowledge-Based Machine Translation Evaluation

Figure 4 for KoBE: Knowledge-Based Machine Translation Evaluation

Abstract:We propose a simple and effective method for machine translation evaluation which does not require reference translations. Our approach is based on (1) grounding the entity mentions found in each source sentence and candidate translation against a large-scale multilingual knowledge base, and (2) measuring the recall of the grounded entities found in the candidate vs. those found in the source. Our approach achieves the highest correlation with human judgements on 9 out of the 18 language pairs from the WMT19 benchmark for evaluation without references, which is the largest number of wins for a single evaluation method on this task. On 4 language pairs, we also achieve higher correlation with human judgements than BLEU. To foster further research, we release a dataset containing 1.8 million grounded entity mentions across 18 language pairs from the WMT19 metrics track data.

* Accepted as a short paper in Findings of EMNLP 2020

Via

Access Paper or Ask Questions

A Joint Named-Entity Recognizer for Heterogeneous Tag-setsUsing a Tag Hierarchy

May 22, 2019

Genady Beryozkin, Yoel Drori, Oren Gilon, Tzvika Hartman, Idan Szpektor

Figure 1 for A Joint Named-Entity Recognizer for Heterogeneous Tag-setsUsing a Tag Hierarchy

Figure 2 for A Joint Named-Entity Recognizer for Heterogeneous Tag-setsUsing a Tag Hierarchy

Figure 3 for A Joint Named-Entity Recognizer for Heterogeneous Tag-setsUsing a Tag Hierarchy

Figure 4 for A Joint Named-Entity Recognizer for Heterogeneous Tag-setsUsing a Tag Hierarchy

Abstract:We study a variant of domain adaptation for named-entity recognition where multiple, heterogeneously tagged training sets are available. Furthermore, the test tag-set is not identical to any individual training tag-set. Yet, the relations between all tags are provided in a tag hierarchy, covering the test tags as a combination of training tags. This setting occurs when various datasets are created using different annotation schemes. This is also the case of extending a tag-set with a new tag by annotating only the new tag in a new dataset. We propose to use the given tag hierarchy to jointly learn a neural network that shares its tagging layer among all tag-sets. We compare this model to combining independent models and to a model based on the multitasking approach. Our experiments show the benefit of the tag-hierarchy model, especially when facing non-trivial consolidation of tag-sets.

* Accepted at ACL 2019

Via

Access Paper or Ask Questions

Audio De-identification: A New Entity Recognition Task

Mar 17, 2019

Ido Cohn, Itay Laish, Genady Beryozkin, Gang Li, Izhak Shafran, Idan Szpektor, Tzvika Hartman, Avinatan Hassidim, Yossi Matias

Figure 1 for Audio De-identification: A New Entity Recognition Task

Figure 2 for Audio De-identification: A New Entity Recognition Task

Figure 3 for Audio De-identification: A New Entity Recognition Task

Figure 4 for Audio De-identification: A New Entity Recognition Task

Abstract:Named Entity Recognition (NER) has been mostly studied in the context of written text. Specifically, NER is an important step in de-identification (de-ID) of medical records, many of which are recorded conversations between a patient and a doctor. In such recordings, audio spans with personal information should be redacted, similar to the redaction of sensitive character spans in de-ID for written text. The application of NER in the context of audio de-identification has yet to be fully investigated. To this end, we define the task of audio de-ID, in which audio spans with entity mentions should be detected. We then present our pipeline for this task, which involves Automatic Speech Recognition (ASR), NER on the transcript text, and text-to-audio alignment. Finally, we introduce a novel metric for audio de-ID and a new evaluation benchmark consisting of a large labeled segment of the Switchboard and Fisher audio datasets and detail our pipeline's results on it.

* Accepted to NAACL 2019 Industry Track

Via

Access Paper or Ask Questions