Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mirza Omer Beg

Towards MLOps: A DevOps Tools Recommender System for Machine Learning System

Feb 20, 2024

Pir Sami Ullah Shah, Naveed Ahmad, Mirza Omer Beg

Abstract:Applying DevOps practices to machine learning system is termed as MLOps and machine learning systems evolve on new data unlike traditional systems on requirements. The objective of MLOps is to establish a connection between different open-source tools to construct a pipeline that can automatically perform steps to construct a dataset, train the machine learning model and deploy the model to the production as well as store different versions of model and dataset. Benefits of MLOps is to make sure the fast delivery of the new trained models to the production to have accurate results. Furthermore, MLOps practice impacts the overall quality of the software products and is completely dependent on open-source tools and selection of relevant open-source tools is considered as challenged while a generalized method to select an appropriate open-source tools is desirable. In this paper, we present a framework for recommendation system that processes the contextual information (e.g., nature of data, type of the data) of the machine learning project and recommends a relevant toolchain (tech-stack) for the operationalization of machine learning systems. To check the applicability of the proposed framework, four different approaches i.e., rule-based, random forest, decision trees and k-nearest neighbors were investigated where precision, recall and f-score is measured, the random forest out classed other approaches with highest f-score value of 0.66.

Via

Access Paper or Ask Questions

EMP-EVAL: A Framework for Measuring Empathy in Open Domain Dialogues

Jan 29, 2023

Bushra Amjad, Muhammad Zeeshan, Mirza Omer Beg

Abstract:Measuring empathy in conversation can be challenging, as empathy is a complex and multifaceted psychological construct that involves both cognitive and emotional components. Human evaluations can be subjective, leading to inconsistent results. Therefore, there is a need for an automatic method for measuring empathy that reduces the need for human evaluations. In this paper, we proposed a novel approach EMP-EVAL, a simple yet effective automatic empathy evaluation method. The proposed technique takes the influence of Emotion, Cognitive and Emotional empathy. To the best knowledge, our work is the first to systematically measure empathy without the human-annotated provided scores. Experimental results demonstrate that our metrics can correlate with human preference, achieving comparable results with human judgments.

* 7 pages, 5 figures, 4 tables

Via

Access Paper or Ask Questions

RUBERT: A Bilingual Roman Urdu BERT Using Cross Lingual Transfer Learning

Feb 22, 2021

Usama Khalid, Mirza Omer Beg, Muhammad Umair Arshad

Figure 1 for RUBERT: A Bilingual Roman Urdu BERT Using Cross Lingual Transfer Learning

Figure 2 for RUBERT: A Bilingual Roman Urdu BERT Using Cross Lingual Transfer Learning

Figure 3 for RUBERT: A Bilingual Roman Urdu BERT Using Cross Lingual Transfer Learning

Figure 4 for RUBERT: A Bilingual Roman Urdu BERT Using Cross Lingual Transfer Learning

Abstract:In recent studies, it has been shown that Multilingual language models underperform their monolingual counterparts. It is also a well-known fact that training and maintaining monolingual models for each language is a costly and time-consuming process. Roman Urdu is a resource-starved language used popularly on social media platforms and chat apps. In this research, we propose a novel dataset of scraped tweets containing 54M tokens and 3M sentences. Additionally, we also propose RUBERT a bilingual Roman Urdu model created by additional pretraining of English BERT. We compare its performance with a monolingual Roman Urdu BERT trained from scratch and a multilingual Roman Urdu BERT created by additional pretraining of Multilingual BERT. We show through our experiments that additional pretraining of the English BERT produces the most notable performance improvement.

* arXiv admin note: substantial text overlap with arXiv:2102.10958

Via

Access Paper or Ask Questions

Bilingual Language Modeling, A transfer learning technique for Roman Urdu

Feb 22, 2021

Usama Khalid, Mirza Omer Beg, Muhammad Umair Arshad

Figure 1 for Bilingual Language Modeling, A transfer learning technique for Roman Urdu

Figure 2 for Bilingual Language Modeling, A transfer learning technique for Roman Urdu

Figure 3 for Bilingual Language Modeling, A transfer learning technique for Roman Urdu

Figure 4 for Bilingual Language Modeling, A transfer learning technique for Roman Urdu

Abstract:Pretrained language models are now of widespread use in Natural Language Processing. Despite their success, applying them to Low Resource languages is still a huge challenge. Although Multilingual models hold great promise, applying them to specific low-resource languages e.g. Roman Urdu can be excessive. In this paper, we show how the code-switching property of languages may be used to perform cross-lingual transfer learning from a corresponding high resource language. We also show how this transfer learning technique termed Bilingual Language Modeling can be used to produce better performing models for Roman Urdu. To enable training and experimentation, we also present a collection of novel corpora for Roman Urdu extracted from various sources and social networking sites, e.g. Twitter. We train Monolingual, Multilingual, and Bilingual models of Roman Urdu - the proposed bilingual model achieves 23% accuracy compared to the 2% and 11% of the monolingual and multilingual models respectively in the Masked Language Modeling (MLM) task.

Via

Access Paper or Ask Questions

Co-occurrences using Fasttext embeddings for word similarity tasks in Urdu

Feb 22, 2021

Usama Khalid, Aizaz Hussain, Muhammad Umair Arshad, Waseem Shahzad, Mirza Omer Beg

Figure 1 for Co-occurrences using Fasttext embeddings for word similarity tasks in Urdu

Figure 2 for Co-occurrences using Fasttext embeddings for word similarity tasks in Urdu

Figure 3 for Co-occurrences using Fasttext embeddings for word similarity tasks in Urdu

Figure 4 for Co-occurrences using Fasttext embeddings for word similarity tasks in Urdu

Abstract:Urdu is a widely spoken language in South Asia. Though immoderate literature exists for the Urdu language still the data isn't enough to naturally process the language by NLP techniques. Very efficient language models exist for the English language, a high resource language, but Urdu and other under-resourced languages have been neglected for a long time. To create efficient language models for these languages we must have good word embedding models. For Urdu, we can only find word embeddings trained and developed using the skip-gram model. In this paper, we have built a corpus for Urdu by scraping and integrating data from various sources and compiled a vocabulary for the Urdu language. We also modify fasttext embeddings and N-Grams models to enable training them on our built corpus. We have used these trained embeddings for a word similarity task and compared the results with existing techniques.

Via

Access Paper or Ask Questions

Few Shot Learning for Information Verification

Feb 22, 2021

Usama Khalid, Mirza Omer Beg

Figure 1 for Few Shot Learning for Information Verification

Abstract:Information verification is quite a challenging task, this is because many times verifying a claim can require picking pieces of information from multiple pieces of evidence which can have a hierarchy of complex semantic relations. Previously a lot of researchers have mainly focused on simply concatenating multiple evidence sentences to accept or reject claims. These approaches are limited as evidence can contain hierarchical information and dependencies. In this research, we aim to verify facts based on evidence selected from a list of articles taken from Wikipedia. Pretrained language models such as XLNET are used to generate meaningful representations and graph-based attention and convolutions are used in such a way that the system requires little additional training to learn to verify facts.

Via

Access Paper or Ask Questions

Detecting Compliance of Privacy Policies with Data Protection Laws

Feb 21, 2021

Ayesha Qamar, Tehreem Javed, Mirza Omer Beg

Figure 1 for Detecting Compliance of Privacy Policies with Data Protection Laws

Figure 2 for Detecting Compliance of Privacy Policies with Data Protection Laws

Figure 3 for Detecting Compliance of Privacy Policies with Data Protection Laws

Figure 4 for Detecting Compliance of Privacy Policies with Data Protection Laws

Abstract:Privacy Policies are the legal documents that describe the practices that an organization or company has adopted in the handling of the personal data of its users. But as policies are a legal document, they are often written in extensive legal jargon that is difficult to understand. Though work has been done on privacy policies but none that caters to the problem of verifying if a given privacy policy adheres to the data protection laws of a given country or state. We aim to bridge that gap by providing a framework that analyzes privacy policies in light of various data protection laws, such as the General Data Protection Regulation (GDPR). To achieve that, firstly we labeled both the privacy policies and laws. Then a correlation scheme is developed to map the contents of a privacy policy to the appropriate segments of law that a policy must conform to. Then we check the compliance of privacy policy's text with the corresponding text of the law using NLP techniques. By using such a tool, users would be better equipped to understand how their personal data is managed. For now, we have provided a mapping for the GDPR and PDPA, but other laws can easily be incorporated in the already built pipeline.

Via

Access Paper or Ask Questions