Abstract:Machine learning models have been progressively used for predicting materials properties. These models can be built using pre-existing data and are useful for rapidly screening the physicochemical space of a material, which is astronomically large. However, ML models are inherently interpolative, and their efficacy for searching candidates outside a material's known range of property is unresolved. Moreover, the performance of an ML model is intricately connected to its learning strategy and the volume of training data. Here, we determine the relationship between the extrapolation ability of an ML model, the size and range of its training dataset, and its learning approach. We focus on a canonical problem of predicting the properties of a copolymer as a function of the sequence of its monomers. Tree search algorithms, which learn the similarity between polymer structures, are found to be inefficient for extrapolation. Conversely, the extrapolation capability of neural networks and XGBoost models, which attempt to learn the underlying functional correlation between the structure and property of polymers, show strong correlations with the volume and range of training data. These findings have important implications on ML-based new material development.
Abstract:Heart disease is a serious global health issue that claims millions of lives every year. Early detection and precise prediction are critical to the prevention and successful treatment of heart related issues. A lot of research utilizes machine learning (ML) models to forecast cardiac disease and obtain early detection. In order to do predictive analysis on "Heart disease health indicators " dataset. We employed five machine learning methods in this paper: Decision Tree (DT), Random Forest (RF), Linear Discriminant Analysis, Extra Tree Classifier, and AdaBoost. The model is further examined using various feature selection (FS) techniques. To enhance the baseline model, we have separately applied four FS techniques: Sequential Forward FS, Sequential Backward FS, Correlation Matrix, and Chi2. Lastly, K means SMOTE oversampling is applied to the models to enable additional analysis. The findings show that when it came to predicting heart disease, ensemble approaches in particular, random forests performed better than individual classifiers. The presence of smoking, blood pressure, cholesterol, and physical inactivity were among the major predictors that were found. The accuracy of the Random Forest and Decision Tree model was 99.83%. This paper demonstrates how machine learning models can improve the accuracy of heart disease prediction, especially when using ensemble methodologies. The models provide a more accurate risk assessment than traditional methods since they incorporate a large number of factors and complex algorithms.
Abstract:Sentiment analysis (SA), is an approach of natural language processing (NLP) for determining a text's emotional tone by analyzing subjective information such as views, feelings, and attitudes toward specific topics, products, services, events, or experiences. This study attempts to develop an advanced deep learning (DL) model for SA to understand global audience emotions through tweets in the context of the Olympic Games. The findings represent global attitudes around the Olympics and contribute to advancing the SA models. We have used NLP for tweet pre-processing and sophisticated DL models for arguing with SA, this research enhances the reliability and accuracy of sentiment classification. The study focuses on data selection, preprocessing, visualization, feature extraction, and model building, featuring a baseline Na\"ive Bayes (NB) model and three advanced DL models: Convolutional Neural Network (CNN), Bidirectional Long Short-Term Memory (BiLSTM), and Bidirectional Encoder Representations from Transformers (BERT). The results of the experiments show that the BERT model can efficiently classify sentiments related to the Olympics, achieving the highest accuracy of 99.23%.