Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sudeep Choudhary

iSign: A Benchmark for Indian Sign Language Processing

Jul 07, 2024

Abhinav Joshi, Romit Mohanty, Mounika Kanakanti, Andesha Mangla, Sudeep Choudhary, Monali Barbate, Ashutosh Modi

Figure 1 for iSign: A Benchmark for Indian Sign Language Processing

Figure 2 for iSign: A Benchmark for Indian Sign Language Processing

Figure 3 for iSign: A Benchmark for Indian Sign Language Processing

Figure 4 for iSign: A Benchmark for Indian Sign Language Processing

Abstract:Indian Sign Language has limited resources for developing machine learning and data-driven approaches for automated language processing. Though text/audio-based language processing techniques have shown colossal research interest and tremendous improvements in the last few years, Sign Languages still need to catch up due to the need for more resources. To bridge this gap, in this work, we propose iSign: a benchmark for Indian Sign Language (ISL) Processing. We make three primary contributions to this work. First, we release one of the largest ISL-English datasets with more than 118K video-sentence/phrase pairs. To the best of our knowledge, it is the largest sign language dataset available for ISL. Second, we propose multiple NLP-specific tasks (including SignVideo2Text, SignPose2Text, Text2Pose, Word Prediction, and Sign Semantics) and benchmark them with the baseline models for easier access to the research community. Third, we provide detailed insights into the proposed benchmarks with a few linguistic insights into the workings of ISL. We streamline the evaluation of Sign Language processing, addressing the gaps in the NLP research community for Sign Languages. We release the dataset, tasks, and models via the following website: https://exploration-lab.github.io/iSign/

* Accepted at ACL 2024 Findings. 18 Pages (9 Pages + References + Appendix)

Via

Access Paper or Ask Questions

DCoM: A Deep Column Mapper for Semantic Data Type Detection

Jun 24, 2021

Subhadip Maji, Swapna Sourav Rout, Sudeep Choudhary

Figure 1 for DCoM: A Deep Column Mapper for Semantic Data Type Detection

Figure 2 for DCoM: A Deep Column Mapper for Semantic Data Type Detection

Figure 3 for DCoM: A Deep Column Mapper for Semantic Data Type Detection

Figure 4 for DCoM: A Deep Column Mapper for Semantic Data Type Detection

Abstract:Detection of semantic data types is a very crucial task in data science for automated data cleaning, schema matching, data discovery, semantic data type normalization and sensitive data identification. Existing methods include regular expression-based or dictionary lookup-based methods that are not robust to dirty as well unseen data and are limited to a very less number of semantic data types to predict. Existing Machine Learning methods extract large number of engineered features from data and build logistic regression, random forest or feedforward neural network for this purpose. In this paper, we introduce DCoM, a collection of multi-input NLP-based deep neural networks to detect semantic data types where instead of extracting large number of features from the data, we feed the raw values of columns (or instances) to the model as texts. We train DCoM on 686,765 data columns extracted from VizNet corpus with 78 different semantic data types. DCoM outperforms other contemporary results with a quite significant margin on the same dataset.

* 9 pages, 2 figures, 7 tables

Via

Access Paper or Ask Questions