Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Chayma Fourati

Bambara Language Dataset for Sentiment Analysis

Aug 05, 2021

Mountaga Diallo, Chayma Fourati, Hatem Haddad

Figure 1 for Bambara Language Dataset for Sentiment Analysis

Figure 2 for Bambara Language Dataset for Sentiment Analysis

Figure 3 for Bambara Language Dataset for Sentiment Analysis

Figure 4 for Bambara Language Dataset for Sentiment Analysis

Abstract:For easier communication, posting, or commenting on each others posts, people use their dialects. In Africa, various languages and dialects exist. However, they are still underrepresented and not fully exploited for analytical studies and research purposes. In order to perform approaches like Machine Learning and Deep Learning, datasets are required. One of the African languages is Bambara, used by citizens in different countries. However, no previous work on datasets for this language was performed for Sentiment Analysis. In this paper, we present the first common-crawl-based Bambara dialectal dataset dedicated for Sentiment Analysis, available freely for Natural Language Processing research purposes.

* 2nd Workshop on Practical ML for Developing Countries: Learning Under Limited/low Resource Scenarios, International Conference on Learning Representations, 2021

Via

Access Paper or Ask Questions

AI4D -- African Language Program

Apr 06, 2021

Kathleen Siminyu, Godson Kalipe, Davor Orlic, Jade Abbott, Vukosi Marivate, Sackey Freshia, Prateek Sibal, Bhanu Neupane, David I. Adelani, Amelia Taylor(+8 more)

Figure 1 for AI4D -- African Language Program

Figure 2 for AI4D -- African Language Program

Abstract:Advances in speech and language technologies enable tools such as voice-search, text-to-speech, speech recognition and machine translation. These are however only available for high resource languages like English, French or Chinese. Without foundational digital resources for African languages, which are considered low-resource in the digital context, these advanced tools remain out of reach. This work details the AI4D - African Language Program, a 3-part project that 1) incentivised the crowd-sourcing, collection and curation of language datasets through an online quantitative and qualitative challenge, 2) supported research fellows for a period of 3-4 months to create datasets annotated for NLP tasks, and 3) hosted competitive Machine Learning challenges on the basis of these datasets. Key outcomes of the work so far include 1) the creation of 9+ open source, African language datasets annotated for a variety of ML tasks, and 2) the creation of baseline models for these datasets through hosting of competitive ML challenges.

Via

Access Paper or Ask Questions

A Multilingual African Embedding for FAQ Chatbots

Mar 16, 2021

Aymen Ben Elhaj Mabrouk, Moez Ben Haj Hmida, Chayma Fourati, Hatem Haddad, Abir Messaoudi

Figure 1 for A Multilingual African Embedding for FAQ Chatbots

Figure 2 for A Multilingual African Embedding for FAQ Chatbots

Figure 3 for A Multilingual African Embedding for FAQ Chatbots

Figure 4 for A Multilingual African Embedding for FAQ Chatbots

Abstract:Searching for an available, reliable, official, and understandable information is not a trivial task due to scattered information across the internet, and the availability lack of governmental communication channels communicating with African dialects and languages. In this paper, we introduce an Artificial Intelligence Powered chatbot for crisis communication that would be omnichannel, multilingual and multi dialectal. We present our work on modified StarSpace embedding tailored for African dialects for the question-answering task along with the architecture of the proposed chatbot system and a description of the different layers. English, French, Arabic, Tunisian, Igbo,Yor\`ub\'a, and Hausa are used as languages and dialects. Quantitative and qualitative evaluation results are obtained for our real deployed Covid-19 chatbot. Results show that users are satisfied and the conversation with the chatbot is meeting customer needs.

Via

Access Paper or Ask Questions

Learning Word Representations for Tunisian Sentiment Analysis

Oct 14, 2020

Abir Messaoudi, Hatem Haddad, Moez Ben HajHmida, Chayma Fourati, Abderrazak Ben Hamida

Figure 1 for Learning Word Representations for Tunisian Sentiment Analysis

Figure 2 for Learning Word Representations for Tunisian Sentiment Analysis

Figure 3 for Learning Word Representations for Tunisian Sentiment Analysis

Figure 4 for Learning Word Representations for Tunisian Sentiment Analysis

Abstract:Tunisians on social media tend to express themselves in their local dialect using Latin script (TUNIZI). This raises an additional challenge to the process of exploring and recognizing online opinions. To date, very little work has addressed TUNIZI sentiment analysis due to scarce resources for training an automated system. In this paper, we focus on the Tunisian dialect sentiment analysis used on social media. Most of the previous work used machine learning techniques combined with handcrafted features. More recently, Deep Neural Networks were widely used for this task, especially for the English language. In this paper, we explore the importance of various unsupervised word representations (word2vec, BERT) and we investigate the use of Convolutional Neural Networks and Bidirectional Long Short-Term Memory. Without using any kind of handcrafted features, our experimental results on two publicly available datasets showed comparable performances to other languages.

Via

Access Paper or Ask Questions

TUNIZI: a Tunisian Arabizi sentiment analysis Dataset

Apr 29, 2020

Chayma Fourati, Abir Messaoudi, Hatem Haddad

Figure 1 for TUNIZI: a Tunisian Arabizi sentiment analysis Dataset

Abstract:On social media, Arabic people tend to express themselves in their own local dialects. More particularly, Tunisians use the informal way called "Tunisian Arabizi". Analytical studies seek to explore and recognize online opinions aiming to exploit them for planning and prediction purposes such as measuring the customer satisfaction and establishing sales and marketing strategies. However, analytical studies based on Deep Learning are data hungry. On the other hand, African languages and dialects are considered low resource languages. For instance, to the best of our knowledge, no annotated Tunisian Arabizi dataset exists. In this paper, we introduce TUNIZI a sentiment analysis Tunisian Arabizi Dataset, collected from social networks, preprocessed for analytical studies and annotated manually by Tunisian native speakers.

Via

Access Paper or Ask Questions