Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Alessia Battisti

Modelling the Distribution of Human Motion for Sign Language Assessment

Aug 19, 2024

Oliver Cory, Ozge Mercanoglu Sincan, Matthew Vowels, Alessia Battisti, Franz Holzknecht, Katja Tissi, Sandra Sidler-Miserez, Tobias Haug, Sarah Ebling, Richard Bowden

Figure 1 for Modelling the Distribution of Human Motion for Sign Language Assessment

Figure 2 for Modelling the Distribution of Human Motion for Sign Language Assessment

Figure 3 for Modelling the Distribution of Human Motion for Sign Language Assessment

Figure 4 for Modelling the Distribution of Human Motion for Sign Language Assessment

Abstract:Sign Language Assessment (SLA) tools are useful to aid in language learning and are underdeveloped. Previous work has focused on isolated signs or comparison against a single reference video to assess Sign Languages (SL). This paper introduces a novel SLA tool designed to evaluate the comprehensibility of SL by modelling the natural distribution of human motion. We train our pipeline on data from native signers and evaluate it using SL learners. We compare our results to ratings from a human raters study and find strong correlation between human ratings and our tool. We visually demonstrate our tools ability to detect anomalous results spatio-temporally, providing actionable feedback to aid in SL learning and assessment.

* Accepted to Twelfth International Workshop on Assistive Computer Vision and Robotics at ECCV 2024

Via

Access Paper or Ask Questions

Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets

Mar 22, 2021

Isaac Caswell, Julia Kreutzer, Lisa Wang, Ahsan Wahab, Daan van Esch, Nasanbayar Ulzii-Orshikh, Allahsera Tapo, Nishant Subramani, Artem Sokolov, Claytone Sikasote(+42 more)

Figure 1 for Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets

Figure 2 for Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets

Figure 3 for Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets

Figure 4 for Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets

Abstract:With the success of large-scale pre-training and multilingual modeling in Natural Language Processing (NLP), recent years have seen a proliferation of large, web-mined text datasets covering hundreds of languages. However, to date there has been no systematic analysis of the quality of these publicly available datasets, or whether the datasets actually contain content in the languages they claim to represent. In this work, we manually audit the quality of 205 language-specific corpora released with five major public datasets (CCAligned, ParaCrawl, WikiMatrix, OSCAR, mC4), and audit the correctness of language codes in a sixth (JW300). We find that lower-resource corpora have systematic issues: at least 15 corpora are completely erroneous, and a significant fraction contains less than 50% sentences of acceptable quality. Similarly, we find 82 corpora that are mislabeled or use nonstandard/ambiguous language codes. We demonstrate that these issues are easy to detect even for non-speakers of the languages in question, and supplement the human judgements with automatic analyses. Inspired by our analysis, we recommend techniques to evaluate and improve multilingual corpora and discuss the risks that come with low-quality data releases.

* 10 pages paper; 10 pages appendix; AfricaNLP 2021

Via

Access Paper or Ask Questions

A Corpus for Automatic Readability Assessment and Text Simplification of German

Sep 19, 2019

Alessia Battisti, Sarah Ebling

Figure 1 for A Corpus for Automatic Readability Assessment and Text Simplification of German

Figure 2 for A Corpus for Automatic Readability Assessment and Text Simplification of German

Figure 3 for A Corpus for Automatic Readability Assessment and Text Simplification of German

Abstract:In this paper, we present a corpus for use in automatic readability assessment and automatic text simplification of German. The corpus is compiled from web sources and consists of approximately 211,000 sentences. As a novel contribution, it contains information on text structure, typography, and images, which can be exploited as part of machine learning approaches to readability assessment and text simplification. The focus of this publication is on representing such information as an extension to an existing corpus standard.

Via

Access Paper or Ask Questions