Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Muskaan Singh

Avengers Assemble: Amalgamation of Non-Semantic Features for Depression Detection

Sep 22, 2024

Orchid Chetia Phukan, Swarup Ranjan Behera, Shubham Singh, Muskaan Singh, Vandana Rajan, Arun Balaji Buduru, Rajesh Sharma, S. R. Mahadeva Prasanna

Figure 1 for Avengers Assemble: Amalgamation of Non-Semantic Features for Depression Detection

Figure 2 for Avengers Assemble: Amalgamation of Non-Semantic Features for Depression Detection

Figure 3 for Avengers Assemble: Amalgamation of Non-Semantic Features for Depression Detection

Figure 4 for Avengers Assemble: Amalgamation of Non-Semantic Features for Depression Detection

Abstract:In this study, we address the challenge of depression detection from speech, focusing on the potential of non-semantic features (NSFs) to capture subtle markers of depression. While prior research has leveraged various features for this task, NSFs-extracted from pre-trained models (PTMs) designed for non-semantic tasks such as paralinguistic speech processing (TRILLsson), speaker recognition (x-vector), and emotion recognition (emoHuBERT)-have shown significant promise. However, the potential of combining these diverse features has not been fully explored. In this work, we demonstrate that the amalgamation of NSFs results in complementary behavior, leading to enhanced depression detection performance. Furthermore, to our end, we introduce a simple novel framework, FuSeR, designed to effectively combine these features. Our results show that FuSeR outperforms models utilizing individual NSFs as well as baseline fusion techniques and obtains state-of-the-art (SOTA) performance in E-DAIC benchmark with RMSE of 5.51 and MAE of 4.48, establishing it as a robust approach for depression detection.

* Submitted to ICASSP 2025

Via

Access Paper or Ask Questions

Speech Recognition Transformers: Topological-lingualism Perspective

Aug 27, 2024

Shruti Singh, Muskaan Singh, Virender Kadyan

Abstract:Transformers have evolved with great success in various artificial intelligence tasks. Thanks to our recent prevalence of self-attention mechanisms, which capture long-term dependency, phenomenal outcomes in speech processing and recognition tasks have been produced. The paper presents a comprehensive survey of transformer techniques oriented in speech modality. The main contents of this survey include (1) background of traditional ASR, end-to-end transformer ecosystem, and speech transformers (2) foundational models in a speech via lingualism paradigm, i.e., monolingual, bilingual, multilingual, and cross-lingual (3) dataset and languages, acoustic features, architecture, decoding, and evaluation metric from a specific topological lingualism perspective (4) popular speech transformer toolkit for building end-to-end ASR systems. Finally, highlight the discussion of open challenges and potential research directions for the community to conduct further research in this domain.

Via

Access Paper or Ask Questions

Modality-Order Matters! A Novel Hierarchical Feature Fusion Method for CoSAm: A Code-Switched Autism Corpus

Jul 19, 2024

Mohd Mujtaba Akhtar, Girish, Muskaan Singh, Orchid Chetia Phukan

Abstract:Autism Spectrum Disorder (ASD) is a complex neuro-developmental challenge, presenting a spectrum of difficulties in social interaction, communication, and the expression of repetitive behaviors in different situations. This increasing prevalence underscores the importance of ASD as a major public health concern and the need for comprehensive research initiatives to advance our understanding of the disorder and its early detection methods. This study introduces a novel hierarchical feature fusion method aimed at enhancing the early detection of ASD in children through the analysis of code-switched speech (English and Hindi). Employing advanced audio processing techniques, the research integrates acoustic, paralinguistic, and linguistic information using Transformer Encoders. This innovative fusion strategy is designed to improve classification robustness and accuracy, crucial for early and precise ASD identification. The methodology involves collecting a code-switched speech corpus, CoSAm, from children diagnosed with ASD and a matched control group. The dataset comprises 61 voice recordings from 30 children diagnosed with ASD and 31 from neurotypical children, aged between 3 and 13 years, resulting in a total of 159.75 minutes of voice recordings. The feature analysis focuses on MFCCs and extensive statistical attributes to capture speech pattern variability and complexity. The best model performance is achieved using a hierarchical fusion technique with an accuracy of 98.75% using a combination of acoustic and linguistic features first, followed by paralinguistic features in a hierarchical manner.

* Submitted to Computer Speech & Language

Via

Access Paper or Ask Questions

ComFeAT: Combination of Neural and Spectral Features for Improved Depression Detection

Jun 10, 2024

Orchid Chetia Phukan, Sarthak Jain, Shubham Singh, Muskaan Singh, Arun Balaji Buduru, Rajesh Sharma

Figure 1 for ComFeAT: Combination of Neural and Spectral Features for Improved Depression Detection

Figure 2 for ComFeAT: Combination of Neural and Spectral Features for Improved Depression Detection

Figure 3 for ComFeAT: Combination of Neural and Spectral Features for Improved Depression Detection

Abstract:In this work, we focus on the detection of depression through speech analysis. Previous research has widely explored features extracted from pre-trained models (PTMs) primarily trained for paralinguistic tasks. Although these features have led to sufficient advances in speech-based depression detection, their performance declines in real-world settings. To address this, in this paper, we introduce ComFeAT, an application that employs a CNN model trained on a combination of features extracted from PTMs, a.k.a. neural features and spectral features to enhance depression detection. Spectral features are robust to domain variations, but, they are not as good as neural features in performance, suprisingly, combining them shows complementary behavior and improves over both neural and spectral features individually. The proposed method also improves over previous state-of-the-art (SOTA) works on E-DAIC benchmark.

* Accepted to INTERSPEECH 2024 Show & Tell Demonstrations

Via

Access Paper or Ask Questions

NeuRO: An Application for Code-Switched Autism Detection in Children

Jun 05, 2024

Mohd Mujtaba Akhtar, Girish, Orchid Chetia Phukan, Muskaan Singh

Abstract:Code-switching is a common communication phenomenon where individuals alternate between two or more languages or linguistic styles within a single conversation. Autism Spectrum Disorder (ASD) is a developmental disorder posing challenges in social interaction, communication, and repetitive behaviors. Detecting ASD in individuals with code-switch scenario presents unique challenges. In this paper, we address this problem by building an application NeuRO which aims to detect potential signs of autism in code-switched conversations, facilitating early intervention and support for individuals with ASD.

* Accepted to INTERSPEECH 24 Show & Tell Demonstrations

Via

Access Paper or Ask Questions

IDIAPers @ Causal News Corpus 2022: Efficient Causal Relation Identification Through a Prompt-based Few-shot Approach

Sep 08, 2022

Sergio Burdisso, Juan Zuluaga-Gomez, Esau Villatoro-Tello, Martin Fajcik, Muskaan Singh, Pavel Smrz, Petr Motlicek

Figure 1 for IDIAPers @ Causal News Corpus 2022: Efficient Causal Relation Identification Through a Prompt-based Few-shot Approach

Figure 2 for IDIAPers @ Causal News Corpus 2022: Efficient Causal Relation Identification Through a Prompt-based Few-shot Approach

Figure 3 for IDIAPers @ Causal News Corpus 2022: Efficient Causal Relation Identification Through a Prompt-based Few-shot Approach

Figure 4 for IDIAPers @ Causal News Corpus 2022: Efficient Causal Relation Identification Through a Prompt-based Few-shot Approach

Abstract:In this paper, we describe our participation in the subtask 1 of CASE-2022, Event Causality Identification with Casual News Corpus. We address the Causal Relation Identification (CRI) task by exploiting a set of simple yet complementary techniques for fine-tuning language models (LMs) on a small number of annotated examples (i.e., a few-shot configuration). We follow a prompt-based prediction approach for fine-tuning LMs in which the CRI task is treated as a masked language modeling problem (MLM). This approach allows LMs natively pre-trained on MLM problems to directly generate textual responses to CRI-specific prompts. We compare the performance of this method against ensemble techniques trained on the entire dataset. Our best-performing submission was trained only with 256 instances per class, a small portion of the entire dataset, and yet was able to obtain the second-best precision (0.82), third-best accuracy (0.82), and an F1-score (0.85) very close to what was reported by the winner team (0.86).

* This manuscript has been submitted to the 5th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE @ EMNLP 2022)

Via

Access Paper or Ask Questions

IDIAPers @ Causal News Corpus 2022: Extracting Cause-Effect-Signal Triplets via Pre-trained Autoregressive Language Model

Sep 08, 2022

Martin Fajcik, Muskaan Singh, Juan Zuluaga-Gomez, Esaú Villatoro-Tello, Sergio Burdisso, Petr Motlicek, Pavel Smrz

Figure 1 for IDIAPers @ Causal News Corpus 2022: Extracting Cause-Effect-Signal Triplets via Pre-trained Autoregressive Language Model

Figure 2 for IDIAPers @ Causal News Corpus 2022: Extracting Cause-Effect-Signal Triplets via Pre-trained Autoregressive Language Model

Figure 3 for IDIAPers @ Causal News Corpus 2022: Extracting Cause-Effect-Signal Triplets via Pre-trained Autoregressive Language Model

Figure 4 for IDIAPers @ Causal News Corpus 2022: Extracting Cause-Effect-Signal Triplets via Pre-trained Autoregressive Language Model

Abstract:In this paper, we describe our shared task submissions for Subtask 2 in CASE-2022, Event Causality Identification with Casual News Corpus. The challenge focused on the automatic detection of all cause-effect-signal spans present in the sentence from news-media. We detect cause-effect-signal spans in a sentence using T5 -- a pre-trained autoregressive language model. We iteratively identify all cause-effect-signal span triplets, always conditioning the prediction of the next triplet on the previously predicted ones. To predict the triplet itself, we consider different causal relationships such as cause$\rightarrow$effect$\rightarrow$signal. Each triplet component is generated via a language model conditioned on the sentence, the previous parts of the current triplet, and previously predicted triplets. Despite training on an extremely small dataset of 160 samples, our approach achieved competitive performance, being placed second in the competition. Furthermore, we show that assuming either cause$\rightarrow$effect or effect$\rightarrow$cause order achieves similar results. Our code and model predictions will be released online.

* Manuscript submitted to CASE@EMNLP

Via

Access Paper or Ask Questions

ALIGNMEET: A Comprehensive Tool for Meeting Annotation, Alignment, and Evaluation

May 11, 2022

Peter Polák, Muskaan Singh, Anna Nedoluzhko, Ondřej Bojar

Figure 1 for ALIGNMEET: A Comprehensive Tool for Meeting Annotation, Alignment, and Evaluation

Figure 2 for ALIGNMEET: A Comprehensive Tool for Meeting Annotation, Alignment, and Evaluation

Figure 3 for ALIGNMEET: A Comprehensive Tool for Meeting Annotation, Alignment, and Evaluation

Figure 4 for ALIGNMEET: A Comprehensive Tool for Meeting Annotation, Alignment, and Evaluation

Abstract:Summarization is a challenging problem, and even more challenging is to manually create, correct, and evaluate the summaries. The severity of the problem grows when the inputs are multi-party dialogues in a meeting setup. To facilitate the research in this area, we present ALIGNMEET, a comprehensive tool for meeting annotation, alignment, and evaluation. The tool aims to provide an efficient and clear interface for fast annotation while mitigating the risk of introducing errors. Moreover, we add an evaluation mode that enables a comprehensive quality evaluation of meeting minutes. To the best of our knowledge, there is no such tool available. We release the tool as open source. It is also directly installable from PyPI.

* Accepted to LREC22

Via

Access Paper or Ask Questions

RerrFact: Reduced Evidence Retrieval Representations for Scientific Claim Verification

Feb 05, 2022

Ashish Rana, Deepanshu Khanna, Muskaan Singh, Tirthankar Ghosal, Harpreet Singh, Prashant Singh Rana

Figure 1 for RerrFact: Reduced Evidence Retrieval Representations for Scientific Claim Verification

Figure 2 for RerrFact: Reduced Evidence Retrieval Representations for Scientific Claim Verification

Figure 3 for RerrFact: Reduced Evidence Retrieval Representations for Scientific Claim Verification

Figure 4 for RerrFact: Reduced Evidence Retrieval Representations for Scientific Claim Verification

Abstract:Exponential growth in digital information outlets and the race to publish has made scientific misinformation more prevalent than ever. However, the task to fact-verify a given scientific claim is not straightforward even for researchers. Scientific claim verification requires in-depth knowledge and great labor from domain experts to substantiate supporting and refuting evidence from credible scientific sources. The SciFact dataset and corresponding task provide a benchmarking leaderboard to the community to develop automatic scientific claim verification systems via extracting and assimilating relevant evidence rationales from source abstracts. In this work, we propose a modular approach that sequentially carries out binary classification for every prediction subtask as in the SciFact leaderboard. Our simple classifier-based approach uses reduced abstract representations to retrieve relevant abstracts. These are further used to train the relevant rationale-selection model. Finally, we carry out two-step stance predictions that first differentiate non-relevant rationales and then identify supporting or refuting rationales for a given claim. Experimentally, our system RerrFact with no fine-tuning, simple design, and a fraction of model parameters fairs competitively on the leaderboard against large-scale, modular, and joint modeling approaches. We make our codebase available at https://github.com/ashishrana160796/RerrFact.

* Accepted in the AAAI-22 Workshop on Scientific Document Understanding at the Thirty-Sixth AAAI Conference on Artificial Intelligence (SDU@AAAI-22)

Via

Access Paper or Ask Questions