Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zifan Jiang

Detecting Cognitive Impairment and Psychological Well-being among Older Adults Using Facial, Acoustic, Linguistic, and Cardiovascular Patterns Derived from Remote Conversations

Dec 23, 2024

Xiaofan Mu, Salman Seyedi, Iris Zheng, Zifan Jiang, Liu Chen, Bolaji Omofojoye, Rachel Hershenberg, Allan I. Levey, Gari D. Clifford, Hiroko H. Dodge(+1 more)

Abstract:The aging society urgently requires scalable methods to monitor cognitive decline and identify social and psychological factors indicative of dementia risk in older adults. Our machine learning (ML) models captured facial, acoustic, linguistic, and cardiovascular features from 39 individuals with normal cognition or Mild Cognitive Impairment derived from remote video conversations and classified cognitive status, social isolation, neuroticism, and psychological well-being. Our model could distinguish Clinical Dementia Rating Scale (CDR) of 0.5 (vs. 0) with 0.78 area under the receiver operating characteristic curve (AUC), social isolation with 0.75 AUC, neuroticism with 0.71 AUC, and negative affect scales with 0.79 AUC. Recent advances in machine learning offer new opportunities to remotely detect cognitive impairment and assess associated factors, such as neuroticism and psychological well-being. Our experiment showed that speech and language patterns were more useful for quantifying cognitive impairment, whereas facial expression and cardiovascular patterns using photoplethysmography (PPG) were more useful for quantifying personality and psychological well-being.

Via

Access Paper or Ask Questions

Pose-Based Sign Language Appearance Transfer

Oct 17, 2024

Amit Moryossef, Gerard Sant, Zifan Jiang

Figure 1 for Pose-Based Sign Language Appearance Transfer

Figure 2 for Pose-Based Sign Language Appearance Transfer

Figure 3 for Pose-Based Sign Language Appearance Transfer

Figure 4 for Pose-Based Sign Language Appearance Transfer

Abstract:We introduce a method for transferring the signer's appearance in sign language skeletal poses while preserving the sign content. Using estimated poses, we transfer the appearance of one signer to another, maintaining natural movements and transitions. This approach improves pose-based rendering and sign stitching while obfuscating identity. Our experiments show that while the method reduces signer identification accuracy, it slightly harms sign recognition performance, highlighting a tradeoff between privacy and utility. Our code is available at \url{https://github.com/sign-language-processing/pose-anonymization}.

Via

Access Paper or Ask Questions

SignCLIP: Connecting Text and Sign Language by Contrastive Learning

Jul 01, 2024

Zifan Jiang, Gerard Sant, Amit Moryossef, Mathias Müller, Rico Sennrich, Sarah Ebling

Figure 1 for SignCLIP: Connecting Text and Sign Language by Contrastive Learning

Figure 2 for SignCLIP: Connecting Text and Sign Language by Contrastive Learning

Figure 3 for SignCLIP: Connecting Text and Sign Language by Contrastive Learning

Figure 4 for SignCLIP: Connecting Text and Sign Language by Contrastive Learning

Abstract:We present SignCLIP, which re-purposes CLIP (Contrastive Language-Image Pretraining) to project spoken language text and sign language videos, two classes of natural languages of distinct modalities, into the same space. SignCLIP is an efficient method of learning useful visual representations for sign language processing from large-scale, multilingual video-text pairs, without directly optimizing for a specific task or sign language which is often of limited size. We pretrain SignCLIP on Spreadthesign, a prominent sign language dictionary consisting of ~500 thousand video clips in up to 44 sign languages, and evaluate it with various downstream datasets. SignCLIP discerns in-domain signing with notable text-to-video/video-to-text retrieval accuracy. It also performs competitively for out-of-domain downstream tasks such as isolated sign language recognition upon essential few-shot prompting or fine-tuning. We analyze the latent space formed by the spoken language text and sign language poses, which provides additional linguistic insights. Our code and models are openly available.

Via

Access Paper or Ask Questions

Linguistically Motivated Sign Language Segmentation

Oct 30, 2023

Amit Moryossef, Zifan Jiang, Mathias Müller, Sarah Ebling, Yoav Goldberg

Figure 1 for Linguistically Motivated Sign Language Segmentation

Figure 2 for Linguistically Motivated Sign Language Segmentation

Figure 3 for Linguistically Motivated Sign Language Segmentation

Figure 4 for Linguistically Motivated Sign Language Segmentation

Abstract:Sign language segmentation is a crucial task in sign language processing systems. It enables downstream tasks such as sign recognition, transcription, and machine translation. In this work, we consider two kinds of segmentation: segmentation into individual signs and segmentation into phrases, larger units comprising several signs. We propose a novel approach to jointly model these two tasks. Our method is motivated by linguistic cues observed in sign language corpora. We replace the predominant IO tagging scheme with BIO tagging to account for continuous signing. Given that prosody plays a significant role in phrase boundaries, we explore the use of optical flow features. We also provide an extensive analysis of hand shapes and 3D hand normalization. We find that introducing BIO tagging is necessary to model sign boundaries. Explicitly encoding prosody by optical flow improves segmentation in shallow models, but its contribution is negligible in deeper models. Careful tuning of the decoding algorithm atop the models further improves the segmentation quality. We demonstrate that our final models generalize to out-of-domain video content in a different signed language, even under a zero-shot setting. We observe that including optical flow and 3D hand normalization enhances the robustness of the model in this context.

* Accepted at EMNLP 2023 (Findings)

Via

Access Paper or Ask Questions

SignBank+: Multilingual Sign Language Translation Dataset

Sep 20, 2023

Amit Moryossef, Zifan Jiang

Figure 1 for SignBank+: Multilingual Sign Language Translation Dataset

Figure 2 for SignBank+: Multilingual Sign Language Translation Dataset

Figure 3 for SignBank+: Multilingual Sign Language Translation Dataset

Figure 4 for SignBank+: Multilingual Sign Language Translation Dataset

Abstract:This work advances the field of sign language machine translation by focusing on dataset quality and simplification of the translation system. We introduce SignBank+, a clean version of the SignBank dataset, optimized for machine translation. Contrary to previous works that employ complex factorization techniques for translation, we advocate for a simplified text-to-text translation approach. Our evaluation shows that models trained on SignBank+ surpass those on the original dataset, establishing a new benchmark and providing an open resource for future research.

Via

Access Paper or Ask Questions

An Open-Source Gloss-Based Baseline for Spoken to Signed Language Translation

May 28, 2023

Amit Moryossef, Mathias Müller, Anne Göhring, Zifan Jiang, Yoav Goldberg, Sarah Ebling

Figure 1 for An Open-Source Gloss-Based Baseline for Spoken to Signed Language Translation

Figure 2 for An Open-Source Gloss-Based Baseline for Spoken to Signed Language Translation

Figure 3 for An Open-Source Gloss-Based Baseline for Spoken to Signed Language Translation

Figure 4 for An Open-Source Gloss-Based Baseline for Spoken to Signed Language Translation

Abstract:Sign language translation systems are complex and require many components. As a result, it is very hard to compare methods across publications. We present an open-source implementation of a text-to-gloss-to-pose-to-video pipeline approach, demonstrating conversion from German to Swiss German Sign Language, French to French Sign Language of Switzerland, and Italian to Italian Sign Language of Switzerland. We propose three different components for the text-to-gloss translation: a lemmatizer, a rule-based word reordering and dropping component, and a neural machine translation system. Gloss-to-pose conversion occurs using data from a lexicon for three different signed languages, with skeletal poses extracted from videos. To generate a sentence, the text-to-gloss system is first run, and the pose representations of the resulting signs are stitched together.

Via

Access Paper or Ask Questions

Automatic Sound Event Detection and Classification of Great Ape Calls Using Neural Networks

Jan 05, 2023

Zifan Jiang, Adrian Soldati, Isaac Schamberg, Adriano R. Lameira, Steven Moran

Abstract:We present a novel approach to automatically detect and classify great ape calls from continuous raw audio recordings collected during field research. Our method leverages deep pretrained and sequential neural networks, including wav2vec 2.0 and LSTM, and is validated on three data sets from three different great ape lineages (orangutans, chimpanzees, and bonobos). The recordings were collected by different researchers and include different annotation schemes, which our pipeline preprocesses and trains in a uniform fashion. Our results for call detection and classification attain high accuracy. Our method is aimed to be generalizable to other animal species, and more generally, sound event detection tasks. To foster future research, we make our pipeline and methods publicly available.

Via

Access Paper or Ask Questions

Considerations for meaningful sign language machine translation based on glosses

Nov 28, 2022

Mathias Müller, Zifan Jiang, Amit Moryossef, Annette Rios, Sarah Ebling

Figure 1 for Considerations for meaningful sign language machine translation based on glosses

Figure 2 for Considerations for meaningful sign language machine translation based on glosses

Figure 3 for Considerations for meaningful sign language machine translation based on glosses

Figure 4 for Considerations for meaningful sign language machine translation based on glosses

Abstract:Automatic sign language processing is gaining popularity in Natural Language Processing (NLP) research (Yin et al., 2021). In machine translation (MT) in particular, sign language translation based on glosses is a prominent approach. In this paper, we review recent works on neural gloss translation. We find that limitations of glosses in general and limitations of specific datasets are not discussed in a transparent manner and that there is no common standard for evaluation. To address these issues, we put forward concrete recommendations for future research on gloss translation. Our suggestions advocate awareness of the inherent limitations of gloss-based approaches, realistic datasets, stronger baselines and convincing evaluation.

Via

Access Paper or Ask Questions

Machine Translation between Spoken Languages and Signed Languages Represented in SignWriting

Oct 11, 2022

Zifan Jiang, Amit Moryossef, Mathias Müller, Sarah Ebling

Figure 1 for Machine Translation between Spoken Languages and Signed Languages Represented in SignWriting

Figure 2 for Machine Translation between Spoken Languages and Signed Languages Represented in SignWriting

Figure 3 for Machine Translation between Spoken Languages and Signed Languages Represented in SignWriting

Figure 4 for Machine Translation between Spoken Languages and Signed Languages Represented in SignWriting

Abstract:This paper presents work on novel machine translation (MT) systems between spoken and signed languages, where signed languages are represented in SignWriting, a sign language writing system. Our work seeks to address the lack of out-of-the-box support for signed languages in current MT systems and is based on the SignBank dataset, which contains pairs of spoken language text and SignWriting content. We introduce novel methods to parse, factorize, decode, and evaluate SignWriting, leveraging ideas from neural factored MT. In a bilingual setup--translating from American Sign Language to (American) English--our method achieves over 30 BLEU, while in two multilingual setups--translating in both directions between spoken languages and signed languages--we achieve over 20 BLEU. We find that common MT techniques used to improve spoken language translation similarly affect the performance of sign language translation. These findings validate our use of an intermediate text representation for signed languages to include them in natural language processing research.

Via

Access Paper or Ask Questions

Privacy-Preserving Eye-tracking Using Deep Learning

Jun 22, 2021

Salman Seyedi, Zifan Jiang, Allan Levey, Gari D. Clifford

Figure 1 for Privacy-Preserving Eye-tracking Using Deep Learning

Figure 2 for Privacy-Preserving Eye-tracking Using Deep Learning

Figure 3 for Privacy-Preserving Eye-tracking Using Deep Learning

Figure 4 for Privacy-Preserving Eye-tracking Using Deep Learning

Abstract:The expanding usage of complex machine learning methods like deep learning has led to an explosion in human activity recognition, particularly applied to health. In particular, as part of a larger body sensor network system, face and full-body analysis is becoming increasingly common for evaluating health status. However, complex models which handle private and sometimes protected data, raise concerns about the potential leak of identifiable data. In this work, we focus on the case of a deep network model trained on images of individual faces. Full-face video recordings taken from 493 individuals undergoing an eye-tracking based evaluation of neurological function were used. Outputs, gradients, intermediate layer outputs, loss, and labels were used as inputs for a deep network with an added support vector machine emission layer to recognize membership in the training data. The inference attack method and associated mathematical analysis indicate that there is a low likelihood of unintended memorization of facial features in the deep learning model. In this study, it is showed that the named model preserves the integrity of training data with reasonable confidence. The same process can be implemented in similar conditions for different models.

Via

Access Paper or Ask Questions