Abstract:Today, the existence of a vast amount of electronic health data has created potential capacities for conducting studies aiming to improve the medical services provided to patients and reduce the costs of the healthcare system. One of the topics that has been receiving attention in the field of medicine in recent years is the identification of patients who are likely to be re-hospitalized shortly after being discharged from the hospital. This identification can help doctors choose appropriate treatment methods, thereby reducing the rate of patient re-hospitalization and resulting in effective treatment cost reduction. In this study, the prediction of patient re-hospitalization using text mining approaches and the processing of discharge report texts in the patient's electronic file has been discussed. To this end, the performance of various machine learning models has been evaluated using two approaches: bag of word and bag of concept, in the process of predicting patient readmission. Comparing the efficiency of these approaches has shown the superiority of the random forest model and the bag of concept approach over other machine learning models and approaches. This research has achieved the highest score in predicting the probability of patient re-hospitalization, with a recall score of 68.9%, compared to similar works that have utilized machine learning models in this field.
Abstract:In today's digital world, social media plays a significant role in facilitating communication and content sharing. However, the exponential rise in user-generated content has led to challenges in maintaining a respectful online environment. In some cases, users have taken advantage of anonymity in order to use harmful language, which can negatively affect the user experience and pose serious social problems. Recognizing the limitations of manual moderation, automatic detection systems have been developed to tackle this problem. Nevertheless, several obstacles persist, including the absence of a universal definition for harmful language, inadequate datasets across languages, the need for detailed annotation guideline, and most importantly, a comprehensive framework. This study aims to address these challenges by introducing, for the first time, a detailed framework adaptable to any language. This framework encompasses various aspects of harmful language detection. A key component of the framework is the development of a general and detailed annotation guideline. Additionally, the integration of sentiment analysis represents a novel approach to enhancing harmful language detection. Also, a definition of harmful language based on the review of different related concepts is presented. To demonstrate the effectiveness of the proposed framework, its implementation in a challenging low-resource language is conducted. We collected a Persian dataset and applied the annotation guideline for harmful detection and sentiment analysis. Next, we present baseline experiments utilizing machine and deep learning methods to set benchmarks. Results prove the framework's high performance, achieving an accuracy of 99.4% in offensive language detection and 66.2% in sentiment analysis.
Abstract:Dental diseases have a significant impact on a considerable portion of the population, leading to various health issues that can detrimentally affect individuals' overall well-being. The integration of automated systems in oral healthcare has become increasingly crucial. Machine learning approaches offer a viable solution to address challenges such as diagnostic difficulties, inefficiencies, and errors in oral disease diagnosis. These methods prove particularly useful when physicians struggle to predict or diagnose diseases at their early stages. In this study, thirteen different machine learning, deep learning, and large language models were employed to determine the severity level of oral health issues based on radiologists' reports. The results revealed that the Few-shot learning with SBERT and Multi-Layer Perceptron model outperformed all other models across various experiments, achieving an impressive accuracy of 94.1% as the best result. Consequently, this model exhibits promise as a reliable tool for evaluating the severity of oral diseases, enabling patients to receive more effective treatment and aiding healthcare professionals in making informed decisions regarding resource allocation and the management of high-risk patients.
Abstract:The COVID-19 pandemic has disrupted the global economy and people's daily lives in unprecedented ways. To make appropriate decisions, it is necessary to diagnose COVID-19 rapidly and accurately. Clinical decision making is influenced by data collected from patients. With the aid of artificial intelligence, COVID-19 has been diagnosed quickly by analyzing symptoms, polymerase chain reaction (PCR), computed tomography scans, chest X-rays, routine laboratory blood tests and even cough sounds. Furthermore, these data can be used to predict a patient's morality, although there is a question about which data makes the most accurate predictions. Therefore, this study consists of two parts. Our first objective is to examine whether machine learning algorithms can predict the outcome of COVID-19 cases (recovery or death), based on the features present in the dataset. In the second part of the research, we investigated the impact of clinical and RT-PCR on prediction of recovery and decease to determine which one is more reliable. We defined four stages with different feature sets and use six machine learning methods to build prediction model. With an accuracy of 78.7%, random forest showed promising results for predicting death and recovery of patients. Based on this, it appears that recovery and decease of patients are predictable using machine learning. For second objective, results indicate that clinical alone (without using RT-PCR), trained with AdaBoost algorithm, is the most accurate with an accuracy of 82.1%. This study can provide guidance for medical professionals in the event of a crisis or outbreak similar to COVID-19.
Abstract:The COVID-19 pandemic has a devastating impact globally, claiming millions of lives and causing significant social and economic disruptions. In order to optimize decision-making and allocate limited resources, it is essential to identify COVID-19 symptoms and determine the severity of each case. Machine learning algorithms offer a potent tool in the medical field, particularly in mining clinical datasets for useful information and guiding scientific decisions. Association rule mining is a machine learning technique for extracting hidden patterns from data. This paper presents an application of association rule mining based Apriori algorithm to discover symptom patterns from COVID-19 patients. The study, using 2875 records of patient, identified the most common symptoms as apnea (72%), cough (64%), fever (59%), weakness (18%), myalgia (14.5%), and sore throat (12%). The proposed method provides clinicians with valuable insight into disease that can assist them in managing and treating it effectively.
Abstract:Scientific literature contains a considerable amount of information that provides an excellent opportunity for developing text mining methods to extract biomedical relationships. An important type of information is the relationship between singular nucleotide polymorphisms (SNP) and traits. In this paper, we present a BioBERT-GRU method to identify SNP- traits associations. Based on the evaluation of our method on the SNPPhenA dataset, it is concluded that this new method performs better than previous machine learning and deep learning based methods. BioBERT-GRU achieved the result a precision of 0.883, recall of 0.882 and F1-score of 0.881.
Abstract:Sentiment analysis is the process of identifying and categorizing people's emotions or opinions regarding various topics. The analysis of Twitter sentiment has become an increasingly popular topic in recent years. In this paper, we present several machine learning and a deep learning model to analysis sentiment of Persian political tweets. Our analysis was conducted using Bag of Words and ParsBERT for word representation. We applied Gaussian Naive Bayes, Gradient Boosting, Logistic Regression, Decision Trees, Random Forests, as well as a combination of CNN and LSTM to classify the polarities of tweets. The results of this study indicate that deep learning with ParsBERT embedding performs better than machine learning. The CNN-LSTM model had the highest classification accuracy with 89 percent on the first dataset with three classes and 71 percent on the second dataset with seven classes. Due to the complexity of Persian, it was a difficult task to achieve this level of efficiency.
Abstract:Artificial intelligence has achieved significant success in handling complex tasks in recent years. This success is due to advances in machine learning algorithms and hardware acceleration. In order to obtain more accurate results and solve more complex problems, algorithms must be trained with more data. This huge amount of data could be time-consuming to process and require a great deal of computation. This solution could be achieved by distributing the data and algorithm across several machines, which is known as distributed machine learning. There has been considerable effort put into distributed machine learning algorithms, and different methods have been proposed so far. In this article, we present a comprehensive summary of the current state-of-the-art in the field through the review of these algorithms. We divide this algorithms in classification and clustering (traditional machine learning), deep learning and deep reinforcement learning groups. Distributed deep learning has gained more attention in recent years and most of studies worked on this algorithms. As a result, most of the articles we discussed here belong to this category. Based on our investigation of algorithms, we highlight limitations that should be addressed in future research.
Abstract:Digital technologies have led to an influx of text created daily in a variety of languages, styles, and formats. A great deal of the popularity of spell-checking systems can be attributed to this phenomenon since they are crucial to polishing the digitally conceived text. In this study, we tackle Typographical Error Type Detection in Persian, which has been relatively understudied. In this paper, we present a public dataset named FarsTypo, containing 3.4 million chronologically ordered and part-of-speech tagged words of diverse topics and linguistic styles. An algorithm for applying Persian-specific errors is developed and applied to a scalable size of these words, forming a parallel dataset of correct and incorrect words. Using FarsTypo, we establish a firm baseline and compare different methodologies using various architectures. In addition, we present a novel Many-to-Many Deep Sequential Neural Network to perform token classification using both word and character embeddings in combination with bidirectional LSTM layers to detect typographical errors across 51 classes. We compare our approach with highly-advanced industrial systems that, unlike this study, have been developed utilizing a variety of resources. The results of our final method were competitive in that we achieved an accuracy of 97.62%, a precision of 98.83%, a recall of 98.61%, and outperformed the rest in terms of speed.
Abstract:Parallel to the rising debates over sustainable energy and artificial intelligence solutions, the world is currently discussing the ethics of artificial intelligence and its possible negative effects on society and the environment. In these arguments, sustainable AI is proposed, which aims at advancing the pathway toward sustainability, such as sustainable energy. In this paper, we offered a novel contextual topic modeling combining LDA, BERT, and Clustering. We then combined these computational analyses with content analysis of related scientific publications to identify the main scholarly topics, sub-themes, and cross-topic themes within scientific research on sustainable AI in energy. Our research identified eight dominant topics including sustainable buildings, AI-based DSSs for urban water management, climate artificial intelligence, Agriculture 4, the convergence of AI with IoT, AI-based evaluation of renewable technologies, smart campus and engineering education, and AI-based optimization. We then recommended 14 potential future research strands based on the observed theoretical gaps. Theoretically, this analysis contributes to the existing literature on sustainable AI and sustainable energy, and practically, it intends to act as a general guide for energy engineers and scientists, AI scientists, and social scientists to widen their knowledge of sustainability in AI and energy convergence research.