Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Dipali Kadam

Multilingual Evaluation of Semantic Textual Relatedness

Apr 13, 2024

Sharvi Endait, Srushti Sonavane, Ridhima Sinare, Pritika Rohera, Advait Naik, Dipali Kadam

Figure 1 for Multilingual Evaluation of Semantic Textual Relatedness

Figure 2 for Multilingual Evaluation of Semantic Textual Relatedness

Figure 3 for Multilingual Evaluation of Semantic Textual Relatedness

Figure 4 for Multilingual Evaluation of Semantic Textual Relatedness

Abstract:The explosive growth of online content demands robust Natural Language Processing (NLP) techniques that can capture nuanced meanings and cultural context across diverse languages. Semantic Textual Relatedness (STR) goes beyond superficial word overlap, considering linguistic elements and non-linguistic factors like topic, sentiment, and perspective. Despite its pivotal role, prior NLP research has predominantly focused on English, limiting its applicability across languages. Addressing this gap, our paper dives into capturing deeper connections between sentences beyond simple word overlap. Going beyond English-centric NLP research, we explore STR in Marathi, Hindi, Spanish, and English, unlocking the potential for information retrieval, machine translation, and more. Leveraging the SemEval-2024 shared task, we explore various language models across three learning paradigms: supervised, unsupervised, and cross-lingual. Our comprehensive methodology gains promising results, demonstrating the effectiveness of our approach. This work aims to not only showcase our achievements but also inspire further research in multilingual STR, particularly for low-resourced languages.

* 8 pages

Via

Access Paper or Ask Questions

Enhancing Low Resource NER Using Assisting Language And Transfer Learning

Jun 10, 2023

Maithili Sabane, Aparna Ranade, Onkar Litake, Parth Patil, Raviraj Joshi, Dipali Kadam

Figure 1 for Enhancing Low Resource NER Using Assisting Language And Transfer Learning

Figure 2 for Enhancing Low Resource NER Using Assisting Language And Transfer Learning

Figure 3 for Enhancing Low Resource NER Using Assisting Language And Transfer Learning

Figure 4 for Enhancing Low Resource NER Using Assisting Language And Transfer Learning

Abstract:Named Entity Recognition (NER) is a fundamental task in NLP that is used to locate the key information in text and is primarily applied in conversational and search systems. In commercial applications, NER or comparable slot-filling methods have been widely deployed for popular languages. NER is used in applications such as human resources, customer service, search engines, content classification, and academia. In this paper, we draw focus on identifying name entities for low-resource Indian languages that are closely related, like Hindi and Marathi. We use various adaptations of BERT such as baseBERT, AlBERT, and RoBERTa to train a supervised NER model. We also compare multilingual models with monolingual models and establish a baseline. In this work, we show the assisting capabilities of the Hindi and Marathi languages for the NER task. We show that models trained using multiple languages perform better than a single language. However, we also observe that blind mixing of all datasets doesn't necessarily provide improvements and data selection methods may be required.

* Accepted at International Conference on Applied Artificial Intelligence and Computing (ICAAIC) 2023

Via

Access Paper or Ask Questions

PICT@DravidianLangTech-ACL2022: Neural Machine Translation On Dravidian Languages

Apr 19, 2022

Aditya Vyawahare, Rahul Tangsali, Aditya Mandke, Onkar Litake, Dipali Kadam

Figure 1 for PICT@DravidianLangTech-ACL2022: Neural Machine Translation On Dravidian Languages

Figure 2 for PICT@DravidianLangTech-ACL2022: Neural Machine Translation On Dravidian Languages

Figure 3 for PICT@DravidianLangTech-ACL2022: Neural Machine Translation On Dravidian Languages

Figure 4 for PICT@DravidianLangTech-ACL2022: Neural Machine Translation On Dravidian Languages

Abstract:This paper presents a summary of the findings that we obtained based on the shared task on machine translation of Dravidian languages. We stood first in three of the five sub-tasks which were assigned to us for the main shared task. We carried out neural machine translation for the following five language pairs: Kannada to Tamil, Kannada to Telugu, Kannada to Malayalam, Kannada to Sanskrit, and Kannada to Tulu. The datasets for each of the five language pairs were used to train various translation models, including Seq2Seq models such as LSTM, bidirectional LSTM, Conv2Seq, and training state-of-the-art as transformers from scratch, and fine-tuning already pre-trained models. For some models involving monolingual corpora, we implemented backtranslation as well. These models' accuracy was later tested with a part of the same dataset using BLEU score as an evaluation metric.

Via

Access Paper or Ask Questions

Optimize_Prime@DravidianLangTech-ACL2022: Abusive Comment Detection in Tamil

Apr 19, 2022

Shantanu Patankar, Omkar Gokhale, Onkar Litake, Aditya Mandke, Dipali Kadam

Figure 1 for Optimize_Prime@DravidianLangTech-ACL2022: Abusive Comment Detection in Tamil

Figure 2 for Optimize_Prime@DravidianLangTech-ACL2022: Abusive Comment Detection in Tamil

Figure 3 for Optimize_Prime@DravidianLangTech-ACL2022: Abusive Comment Detection in Tamil

Abstract:This paper tries to address the problem of abusive comment detection in low-resource indic languages. Abusive comments are statements that are offensive to a person or a group of people. These comments are targeted toward individuals belonging to specific ethnicities, genders, caste, race, sexuality, etc. Abusive Comment Detection is a significant problem, especially with the recent rise in social media users. This paper presents the approach used by our team - Optimize_Prime, in the ACL 2022 shared task "Abusive Comment Detection in Tamil." This task detects and classifies YouTube comments in Tamil and Tamil- English Codemixed format into multiple categories. We have used three methods to optimize our results: Ensemble models, Recurrent Neural Networks, and Transformers. In the Tamil data, MuRIL and XLM-RoBERTA were our best performing models with a macro-averaged f1 score of 0.43. Furthermore, for the Code-mixed data, MuRIL and M-BERT provided sub-lime results, with a macro-averaged f1 score of 0.45.

* arXiv admin note: text overlap with arXiv:2204.09087

Via

Access Paper or Ask Questions

Optimize_Prime@DravidianLangTech-ACL2022: Emotion Analysis in Tamil

Apr 19, 2022

Omkar Gokhale, Shantanu Patankar, Onkar Litake, Aditya Mandke, Dipali Kadam

Figure 1 for Optimize_Prime@DravidianLangTech-ACL2022: Emotion Analysis in Tamil

Figure 2 for Optimize_Prime@DravidianLangTech-ACL2022: Emotion Analysis in Tamil

Figure 3 for Optimize_Prime@DravidianLangTech-ACL2022: Emotion Analysis in Tamil

Figure 4 for Optimize_Prime@DravidianLangTech-ACL2022: Emotion Analysis in Tamil

Abstract:This paper aims to perform an emotion analysis of social media comments in Tamil. Emotion analysis is the process of identifying the emotional context of the text. In this paper, we present the findings obtained by Team Optimize_Prime in the ACL 2022 shared task "Emotion Analysis in Tamil." The task aimed to classify social media comments into categories of emotion like Joy, Anger, Trust, Disgust, etc. The task was further divided into two subtasks, one with 11 broad categories of emotions and the other with 31 specific categories of emotion. We implemented three different approaches to tackle this problem: transformer-based models, Recurrent Neural Networks (RNNs), and Ensemble models. XLM-RoBERTa performed the best on the first task with a macro-averaged f1 score of 0.27, while MuRIL provided the best results on the second task with a macro-averaged f1 score of 0.13.

Via

Access Paper or Ask Questions

Analyzing Architectures for Neural Machine Translation Using Low Computational Resources

Nov 06, 2021

Aditya Mandke, Onkar Litake, Dipali Kadam

Figure 1 for Analyzing Architectures for Neural Machine Translation Using Low Computational Resources

Figure 2 for Analyzing Architectures for Neural Machine Translation Using Low Computational Resources

Figure 3 for Analyzing Architectures for Neural Machine Translation Using Low Computational Resources

Figure 4 for Analyzing Architectures for Neural Machine Translation Using Low Computational Resources

Abstract:With the recent developments in the field of Natural Language Processing, there has been a rise in the use of different architectures for Neural Machine Translation. Transformer architectures are used to achieve state-of-the-art accuracy, but they are very computationally expensive to train. Everyone cannot have such setups consisting of high-end GPUs and other resources. We train our models on low computational resources and investigate the results. As expected, transformers outperformed other architectures, but there were some surprising results. Transformers consisting of more encoders and decoders took more time to train but had fewer BLEU scores. LSTM performed well in the experiment and took comparatively less time to train than transformers, making it suitable to use in situations having time constraints.

Via

Access Paper or Ask Questions