Abstract:Overfitting remains a significant challenge in deep learning, often arising from data outliers, noise, and limited training data. To address this, we propose Divide2Conquer (D2C), a novel technique to mitigate overfitting. D2C partitions the training data into multiple subsets and trains identical models independently on each subset. To balance model generalization and subset-specific learning, the model parameters are periodically aggregated and averaged during training. This process enables the learning of robust patterns while minimizing the influence of outliers and noise. Empirical evaluations on benchmark datasets across diverse deep-learning tasks demonstrate that D2C significantly enhances generalization performance, particularly with larger datasets. Our analysis includes evaluations of decision boundaries, loss curves, and other performance metrics, highlighting D2C's effectiveness both as a standalone technique and in combination with other overfitting reduction methods. We further provide a rigorous mathematical justification for D2C's underlying principles and examine its applicability across multiple domains. Finally, we explore the trade-offs associated with D2C and propose strategies to address them, offering a holistic view of its strengths and limitations. This study establishes D2C as a versatile and effective approach to combating overfitting in deep learning. Our codes are publicly available at: https://github.com/Saiful185/Divide2Conquer.
Abstract:This work focuses on the efficiency of the knowledge distillation approach in generating a lightweight yet powerful BERT based model for natural language processing applications. After the model creation, we applied the resulting model, LastBERT, to a real-world task classifying severity levels of Attention Deficit Hyperactivity Disorder (ADHD)-related concerns from social media text data. Referring to LastBERT, a customized student BERT model, we significantly lowered model parameters from 110 million BERT base to 29 million, resulting in a model approximately 73.64% smaller. On the GLUE benchmark, comprising paraphrase identification, sentiment analysis, and text classification, the student model maintained strong performance across many tasks despite this reduction. The model was also used on a real-world ADHD dataset with an accuracy and F1 score of 85%. When compared to DistilBERT (66M) and ClinicalBERT (110M), LastBERT demonstrated comparable performance, with DistilBERT slightly outperforming it at 87%, and ClinicalBERT achieving 86% across the same metrics. These findings highlight the LastBERT model's capacity to classify degrees of ADHD severity properly, so it offers a useful tool for mental health professionals to assess and comprehend material produced by users on social networking platforms. The study emphasizes the possibilities of knowledge distillation to produce effective models fit for use in resource-limited conditions, hence advancing NLP and mental health diagnosis. Furthermore underlined by the considerable decrease in model size without appreciable performance loss is the lower computational resources needed for training and deployment, hence facilitating greater applicability. Especially using readily available computational tools like Google Colab. This study shows the accessibility and usefulness of advanced NLP methods in pragmatic world applications.
Abstract:Knee osteoarthritis(KO) is a degenerative joint disease that can cause severe pain and impairment. With increased prevalence, precise diagnosis by medical imaging analytics is crucial for appropriate illness management. This research investigates a comparative analysis between traditional machine learning techniques and new deep learning models for diagnosing KO severity from X-ray pictures. This study does not introduce new architectural innovations but rather illuminates the robust applicability and comparative effectiveness of pre-existing ViT models in a medical imaging context, specifically for KO severity diagnosis. The insights garnered from this comparative analysis advocate for the integration of advanced ViT models in clinical diagnostic workflows, potentially revolutionizing the precision and reliability of KO assessments. This study does not introduce new architectural innovations but rather illuminates the robust applicability and comparative effectiveness of pre-existing ViT models in a medical imaging context, specifically for KO severity diagnosis. The insights garnered from this comparative analysis advocate for the integration of advanced ViT models in clinical diagnostic workflows, potentially revolutionizing the precision & reliability of KO assessments. The study utilizes an osteoarthritis dataset from the Osteoarthritis Initiative (OAI) comprising images with 5 severity categories and uneven class distribution. While classic machine learning models like GaussianNB and KNN struggle in feature extraction, Convolutional Neural Networks such as Inception-V3, VGG-19 achieve better accuracy between 55-65% by learning hierarchical visual patterns. However, Vision Transformer architectures like Da-VIT, GCViT and MaxViT emerge as indisputable champions, displaying 66.14% accuracy, 0.703 precision, 0.614 recall, AUC exceeding 0.835 thanks to self-attention processes.
Abstract:Data fabric is an automated and AI-driven data fusion approach to accomplish data management unification without moving data to a centralized location for solving complex data problems. In a Federated learning architecture, the global model is trained based on the learned parameters of several local models that eliminate the necessity of moving data to a centralized repository for machine learning. This paper introduces a secure approach for medical image analysis using federated learning and partially homomorphic encryption within a distributed data fabric architecture. With this method, multiple parties can collaborate in training a machine-learning model without exchanging raw data but using the learned or fused features. The approach complies with laws and regulations such as HIPAA and GDPR, ensuring the privacy and security of the data. The study demonstrates the method's effectiveness through a case study on pituitary tumor classification, achieving a significant level of accuracy. However, the primary focus of the study is on the development and evaluation of federated learning and partially homomorphic encryption as tools for secure medical image analysis. The results highlight the potential of these techniques to be applied to other privacy-sensitive domains and contribute to the growing body of research on secure and privacy-preserving machine learning.
Abstract:Criminal and suspicious activity detection has become a popular research topic in recent years. The rapid growth of computer vision technologies has had a crucial impact on solving this issue. However, physical stalking detection is still a less explored area despite the evolution of modern technology. Nowadays, stalking in public places has become a common occurrence with women being the most affected. Stalking is a visible action that usually occurs before any criminal activity begins as the stalker begins to follow, loiter, and stare at the victim before committing any criminal activity such as assault, kidnapping, rape, and so on. Therefore, it has become a necessity to detect stalking as all of these criminal activities can be stopped in the first place through stalking detection. In this research, we propose a novel deep learning-based hybrid fusion model to detect potential stalkers from a single video with a minimal number of frames. We extract multiple relevant features, such as facial landmarks, head pose estimation, and relative distance, as numerical values from video frames. This data is fed into a multilayer perceptron (MLP) to perform a classification task between a stalking and a non-stalking scenario. Simultaneously, the video frames are fed into a combination of convolutional and LSTM models to extract the spatio-temporal features. We use a fusion of these numerical and spatio-temporal features to build a classifier to detect stalking incidents. Additionally, we introduce a dataset consisting of stalking and non-stalking videos gathered from various feature films and television series, which is also used to train the model. The experimental results show the efficiency and dynamism of our proposed stalker detection system, achieving 89.58% testing accuracy with a significant improvement as compared to the state-of-the-art approaches.
Abstract:Chest X-rays are widely used to diagnose thoracic diseases, but the lack of detailed information about these abnormalities makes it challenging to develop accurate automated diagnosis systems, which is crucial for early detection and effective treatment. To address this challenge, we employed deep learning techniques to identify patterns in chest X-rays that correspond to different diseases. We conducted experiments on the "ChestX-ray14" dataset using various pre-trained CNNs, transformers, hybrid(CNN+Transformer) models and classical models. The best individual model was the CoAtNet, which achieved an area under the receiver operating characteristic curve (AUROC) of 84.2%. By combining the predictions of all trained models using a weighted average ensemble where the weight of each model was determined using differential evolution, we further improved the AUROC to 85.4%, outperforming other state-of-the-art methods in this field. Our findings demonstrate the potential of deep learning techniques, particularly ensemble deep learning, for improving the accuracy of automatic diagnosis of thoracic diseases from chest X-rays.
Abstract:The traditional dietary recommendation systems are basically nutrition or health-aware where the human feelings on food are ignored. Human affects vary when it comes to food cravings, and not all foods are appealing in all moods. A questionnaire-based and preference-aware meal recommendation system can be a solution. However, automated recognition of social affects on different foods and planning the menu considering nutritional demand and social-affect has some significant benefits of the questionnaire-based and preference-aware meal recommendations. A patient with severe illness, a person in a coma, or patients with locked-in syndrome and amyotrophic lateral sclerosis (ALS) cannot express their meal preferences. Therefore, the proposed framework includes a social-affective computing module to recognize the affects of different meals where the person's affect is detected using electroencephalography signals. EEG allows to capture the brain signals and analyze them to anticipate affective toward a food. In this study, we have used a 14-channel wireless Emotive Epoc+ to measure affectivity for different food items. A hierarchical ensemble method is applied to predict affectivity upon multiple feature extraction methods and TOPSIS (Technique for Order of Preference by Similarity to Ideal Solution) is used to generate a food list based on the predicted affectivity. In addition to the meal recommendation, an automated menu planning approach is also proposed considering a person's energy intake requirement, affectivity, and nutritional values of the different menus. The bin-packing algorithm is used for the personalized menu planning of breakfast, lunch, dinner, and snacks. The experimental findings reveal that the suggested affective computing, meal recommendation, and menu planning algorithms perform well across a variety of assessment parameters.
Abstract:The core purpose of developing artificial neural networks was to mimic the functionalities of biological neural networks. However, unlike biological neural networks, traditional artificial neural networks are often structured hierarchically, which can impede the flow of information between neurons as the neurons in the same layer have no connections between them. Hence, we propose a more robust model of artificial neural networks where the hidden neurons, residing in the same hidden layer, are interconnected, enabling the neurons to learn complex patterns and speeding up the convergence rate. With the experimental study of our proposed model as fully connected layers in shallow and deep networks, we demonstrate that the model results in a significant increase in convergence rate.
Abstract:Human conversational styles are measured by the sense of humor, personality, and tone of voice. These characteristics have become essential for conversational intelligent virtual assistants. However, most of the state-of-the-art intelligent virtual assistants (IVAs) are failed to interpret the affective semantics of human voices. This research proposes an anthropomorphic intelligent system that can hold a proper human-like conversation with emotion and personality. A voice style transfer method is also proposed to map the attributes of a specific emotion. Initially, the frequency domain data (Mel-Spectrogram) is created by converting the temporal audio wave data, which comprises discrete patterns for audio features such as notes, pitch, rhythm, and melody. A collateral CNN-Transformer-Encoder is used to predict seven different affective states from voice. The voice is also fed parallelly to the deep-speech, an RNN model that generates the text transcription from the spectrogram. Then the transcripted text is transferred to the multi-domain conversation agent using blended skill talk, transformer-based retrieve-and-generate generation strategy, and beam-search decoding, and an appropriate textual response is generated. The system learns an invertible mapping of data to a latent space that can be manipulated and generates a Mel-spectrogram frame based on previous Mel-spectrogram frames to voice synthesize and style transfer. Finally, the waveform is generated using WaveGlow from the spectrogram. The outcomes of the studies we conducted on individual models were auspicious. Furthermore, users who interacted with the system provided positive feedback, demonstrating the system's effectiveness.
Abstract:Handwritten character recognition is a hot topic for research nowadays. If we can convert a handwritten piece of paper into a text-searchable document using the Optical Character Recognition (OCR) technique, we can easily understand the content and do not need to read the handwritten document. OCR in the English language is very common, but in the Bengali language, it is very hard to find a good quality OCR application. If we can merge machine learning and deep learning with OCR, it could be a huge contribution to this field. Various researchers have proposed a number of strategies for recognizing Bengali handwritten characters. A lot of ML algorithms and deep neural networks were used in their work, but the explanations of their models are not available. In our work, we have used various machine learning algorithms and CNN to recognize handwritten Bengali digits. We have got acceptable accuracy from some ML models, and CNN has given us great testing accuracy. Grad-CAM was used as an XAI method on our CNN model, which gave us insights into the model and helped us detect the origin of interest for recognizing a digit from an image.