Abstract:In modern healthcare, addressing the complexities of accurate disease prediction and personalized recommendations is both crucial and challenging. This research introduces MLtoGAI, which integrates Semantic Web technology with Machine Learning (ML) to enhance disease prediction and offer user-friendly explanations through ChatGPT. The system comprises three key components: a reusable disease ontology that incorporates detailed knowledge about various diseases, a diagnostic classification model that uses patient symptoms to detect specific diseases accurately, and the integration of Semantic Web Rule Language (SWRL) with ontology and ChatGPT to generate clear, personalized health advice. This approach significantly improves prediction accuracy and ensures results that are easy to understand, addressing the complexity of diseases and diverse symptoms. The MLtoGAI system demonstrates substantial advancements in accuracy and user satisfaction, contributing to developing more intelligent and accessible healthcare solutions. This innovative approach combines the strengths of ML algorithms with the ability to provide transparent, human-understandable explanations through ChatGPT, achieving significant improvements in prediction accuracy and user comprehension. By leveraging semantic technology and explainable AI, the system enhances the accuracy of disease prediction and ensures that the recommendations are relevant and easily understood by individual patients. Our research highlights the potential of integrating advanced technologies to overcome existing challenges in medical diagnostics, paving the way for future developments in intelligent healthcare systems. Additionally, the system is validated using 200 synthetic patient data records, ensuring robust performance and reliability.
Abstract:Forests are crucial for ecological balance, but wildfires, a major cause of forest loss, pose significant risks. Fire weather indices, which assess wildfire risk and predict resource demands, are vital. With the rise of sensor networks in fields like healthcare and environmental monitoring, semantic sensor networks are increasingly used to gather climatic data such as wind speed, temperature, and humidity. However, processing these data streams to determine fire weather indices presents challenges, underscoring the growing importance of effective forest fire detection. This paper discusses using Apache Spark for early forest fire detection, enhancing fire risk prediction with meteorological and geographical data. Building on our previous development of Semantic Sensor Network (SSN) ontologies and Semantic Web Rules Language (SWRL) for managing forest fires in Monesterial Natural Park, we expanded SWRL to improve a Decision Support System (DSS) using a Large Language Models (LLMs) and Spark framework. We implemented real-time alerts with Spark streaming, tailored to various fire scenarios, and validated our approach using ontology metrics, query-based evaluations, LLMs score precision, F1 score, and recall measures.
Abstract:Human being and different species of animals having the skills to gather, transferring knowledge, processing, fine-tune and generating information throughout their lifetime. The ability of learning throughout their lifespan is referred as continuous learning which is using neurocognition mechanism. Consequently, in real world computational system of incremental learning autonomous agents also needs such continuous learning mechanism which provide retrieval of information and long-term memory consolidation. However, the main challenge in artificial intelligence is that the incremental learning of the autonomous agent when new data confronted. In such scenarios, the main concern is catastrophic forgetting(CF), i.e., while learning the sequentially, neural network underfits the old data when it confronted with new data. To tackle this CF problem many numerous studied have been proposed, however it is very difficult to compare their performance due to dissimilarity in their evaluation mechanism. Here we focus on the comparison of all algorithms which are having similar type of evaluation mechanism. Here we are comparing three types of incremental learning methods: (1) Exemplar based methods, (2) Memory based methods, and (3) Network based method. In this survey paper, methodology oriented study for catastrophic forgetting in incremental deep neural network is addressed. Furthermore, it contains the mathematical overview of impact-full methods which can be help researchers to deal with CF.
Abstract:Liver diseases pose a significant global health burden, impacting a substantial number of individuals and exerting substantial economic and social consequences. Rising liver problems are considered a fatal disease in many countries, such as Egypt, Molda, etc. The objective of this study is to construct a predictive model for liver illness using Basic Formal Ontology (BFO) and detection rules derived from a decision tree algorithm. Based on these rules, events are detected through batch processing using the Apache Jena framework. Based on the event detected, queries can be directly processed using SPARQL. To make the ontology operational, these Decision Tree (DT) rules are converted into Semantic Web Rule Language (SWRL). Using this SWRL in the ontology for predicting different types of liver disease with the help of the Pellet and Drool inference engines in Protege Tools, a total of 615 records are taken from different liver diseases. After inferring the rules, the result can be generated for the patient according to the DT rules, and other patient-related details along with different precautionary suggestions can be obtained based on these results. Combining query results of batch processing and ontology-generated results can give more accurate suggestions for disease prevention and detection. This work aims to provide a comprehensive approach that is applicable for liver disease prediction, rich knowledge graph representation, and smart querying capabilities. The results show that combining RDF data, SWRL rules, and SPARQL queries for analysing and predicting liver disease can help medical professionals to learn more about liver diseases and make a Decision Support System (DSS) for health care.
Abstract:Vector-borne diseases (VBDs) are a kind of infection caused through the transmission of vectors generated by the bites of infected parasites, bacteria, and viruses, such as ticks, mosquitoes, triatomine bugs, blackflies, and sandflies. If these diseases are not properly treated within a reasonable time frame, the mortality rate may rise. In this work, we propose a set of ontologies that will help in the diagnosis and treatment of vector-borne diseases. For developing VBD's ontology, electronic health records taken from the Indian Health Records website, text data generated from Indian government medical mobile applications, and doctors' prescribed handwritten notes of patients are used as input. This data is then converted into correct text using Optical Character Recognition (OCR) and a spelling checker after pre-processing. Natural Language Processing (NLP) is applied for entity extraction from text data for making Resource Description Framework (RDF) medical data with the help of the Patient Clinical Data (PCD) ontology. Afterwards, Basic Formal Ontology (BFO), National Vector Borne Disease Control Program (NVBDCP) guidelines, and RDF medical data are used to develop ontologies for VBDs, and Semantic Web Rule Language (SWRL) rules are applied for diagnosis and treatment. The developed ontology helps in the construction of decision support systems (DSS) for the NVBDCP to control these diseases.
Abstract:SARS-COV-19 is the most prominent issue which many countries face today. The frequent changes in infections, recovered and deaths represents the dynamic nature of this pandemic. It is very crucial to predict the spreading rate of this virus for accurate decision making against fighting with the situation of getting infected through the virus, tracking and controlling the virus transmission in the community. We develop a prediction model using statistical time series models such as SARIMA and FBProphet to monitor the daily active, recovered and death cases of COVID-19 accurately. Then with the help of various details across each individual patient (like height, weight, gender etc.), we designed a set of rules using Semantic Web Rule Language and some mathematical models for dealing with COVID19 infected cases on an individual basis. After combining all the models, a COVID-19 Ontology is developed and performs various queries using SPARQL query on designed Ontology which accumulate the risk factors, provide appropriate diagnosis, precautions and preventive suggestions for COVID Patients. After comparing the performance of SARIMA and FBProphet, it is observed that the SARIMA model performs better in forecasting of COVID cases. On individual basis COVID case prediction, approx. 497 individual samples have been tested and classified into five different levels of COVID classes such as Having COVID, No COVID, High Risk COVID case, Medium to High Risk case, and Control needed case.
Abstract:Surveillance footage can catch a wide range of realistic anomalies. This research suggests using a weakly supervised strategy to avoid annotating anomalous segments in training videos, which is time consuming. In this approach only video level labels are used to obtain frame level anomaly scores. Weakly supervised video anomaly detection (WSVAD) suffers from the wrong identification of abnormal and normal instances during the training process. Therefore it is important to extract better quality features from the available videos. WIth this motivation, the present paper uses better quality transformer-based features named Videoswin Features followed by the attention layer based on dilated convolution and self attention to capture long and short range dependencies in temporal domain. This gives us a better understanding of available videos. The proposed framework is validated on real-world dataset i.e. ShanghaiTech Campus dataset which results in competitive performance than current state-of-the-art methods. The model and the code are available at https://github.com/kapildeshpande/Anomaly-Detection-in-Surveillance-Videos
Abstract:With healthcare being critical aspect, health insurance has become an important scheme in minimizing medical expenses. Following this, the healthcare industry has seen a significant increase in fraudulent activities owing to increased insurance, and fraud has become a significant contributor to rising medical care expenses, although its impact can be mitigated using fraud detection techniques. To detect fraud, machine learning techniques are used. The Centers for Medicaid and Medicare Services (CMS) of the United States federal government released "Medicare Part D" insurance claims is utilized in this study to develop fraud detection system. Employing machine learning algorithms on a class-imbalanced and high dimensional medicare dataset is a challenging task. To compact such challenges, the present work aims to perform feature extraction following data sampling, afterward applying various classification algorithms, to get better performance. Feature extraction is a dimensionality reduction approach that converts attributes into linear or non-linear combinations of the actual attributes, generating a smaller and more diversified set of attributes and thus reducing the dimensions. Data sampling is commonlya used to address the class imbalance either by expanding the frequency of minority class or reducing the frequency of majority class to obtain approximately equal numbers of occurrences for both classes. The proposed approach is evaluated through standard performance metrics. Thus, to detect fraud efficiently, this study applies autoencoder as a feature extraction technique, synthetic minority oversampling technique (SMOTE) as a data sampling technique, and various gradient boosted decision tree-based classifiers as a classification algorithm. The experimental results show the combination of autoencoders followed by SMOTE on the LightGBM classifier achieved best results.
Abstract:Recent advancement in the field of pervasive healthcare monitoring systems causes the generation of a huge amount of lifelog data in real-time. Chronic diseases are one of the most serious health challenges in developing and developed countries. According to WHO, this accounts for 73% of all deaths and 60% of the global burden of diseases. Chronic disease classification models are now harnessing the potential of lifelog data to explore better healthcare practices. This paper is to construct an optimal feature selection-based unsupervised logistic regression model (OFS-ULR) to classify chronic diseases. Since lifelog data analysis is crucial due to its sensitive nature; thus the conventional classification models show limited performance. Therefore, designing new classifiers for the classification of chronic diseases using lifelog data is the need of the age. The vital part of building a good model depends on pre-processing of the dataset, identifying important features, and then training a learning algorithm with suitable hyper parameters for better performance. The proposed approach improves the performance of existing methods using a series of steps such as (i) removing redundant or invalid instances, (ii) making the data labelled using clustering and partitioning the data into classes, (iii) identifying the suitable subset of features by applying either some domain knowledge or selection algorithm, (iv) hyper parameter tuning for models to get best results, and (v) performance evaluation using Spark streaming environment. For this purpose, two-time series datasets are used in the experiment to compute the accuracy, recall, precision, and f1-score. The experimental analysis proves the suitability of the proposed approach as compared to the conventional classifiers and our newly constructed model achieved highest accuracy and reduced training complexity among all among all.
Abstract:Stress, anxiety, and nervousness are all high-risk health states in everyday life. Previously, stress levels were determined by speaking with people and gaining insight into what they had experienced recently or in the past. Typically, stress is caused by an incidence that occurred a long time ago, but sometimes it is triggered by unknown factors. This is a challenging and complex task, but recent research advances have provided numerous opportunities to automate it. The fundamental features of most of these techniques are electro dermal activity (EDA) and heart rate values (HRV). We utilized an accelerometer to measure body motions to solve this challenge. The proposed novel method employs a test that measures a subject's electrocardiogram (ECG), galvanic skin values (GSV), HRV values, and body movements in order to provide a low-cost and time-saving solution for detecting stress lifestyle disease in modern times using cyber physical systems. This study provides a new hybrid model for lifestyle disease classification that decreases execution time while picking the best collection of characteristics and increases classification accuracy. The developed approach is capable of dealing with the class imbalance problem by using WESAD (wearable stress and affect dataset) dataset. The new model uses the Grid search (GS) method to select an optimized set of hyper parameters, and it uses a combination of the Correlation coefficient based Recursive feature elimination (CoC-RFE) method for optimal feature selection and gradient boosting as an estimator to classify the dataset, which achieves high accuracy and helps to provide smart, accurate, and high-quality healthcare systems. To demonstrate the validity and utility of the proposed methodology, its performance is compared to those of other well-established machine learning models.