Abstract:Cell-penetrating peptides (CPPs) are powerful vectors for the intracellular delivery of a diverse array of therapeutic molecules. Despite their potential, the rational design of CPPs remains a challenging task that often requires extensive experimental efforts and iterations. In this study, we introduce an innovative approach for the de novo design of CPPs, leveraging the strengths of machine learning (ML) and optimization algorithms. Our strategy, named LightCPPgen, integrates a LightGBM-based predictive model with a genetic algorithm (GA), enabling the systematic generation and optimization of CPP sequences. At the core of our methodology is the development of an accurate, efficient, and interpretable predictive model, which utilizes 20 explainable features to shed light on the critical factors influencing CPP translocation capacity. The CPP predictive model works synergistically with an optimization algorithm, which is tuned to enhance computational efficiency while maintaining optimization performance. The GA solutions specifically target the candidate sequences' penetrability score, while trying to maximize similarity with the original non-penetrating peptide in order to retain its original biological and physicochemical properties. By prioritizing the synthesis of only the most promising CPP candidates, LightCPPgen can drastically reduce the time and cost associated with wet lab experiments. In summary, our research makes a substantial contribution to the field of CPP design, offering a robust framework that combines ML and optimization techniques to facilitate the rational design of penetrating peptides, by enhancing the explainability and interpretability of the design process.
Abstract:This dataset encompasses fluorescence spectra and chemical parameters of 24 olive oil samples from the 2019-2020 harvest provided by the producer Conde de Benalua, Granada, Spain. The oils are characterized by different qualities: 10 extra virgin olive oil (EVOO), 8 virgin olive oil (VOO), and 6 lampante olive oil (LOO) samples. For each sample, the dataset includes fluorescence spectra obtained with two excitation wavelengths, oil quality, and five chemical parameters necessary for the quality assessment of olive oil. The fluorescence spectra were obtained by exciting the samples at 365 nm and 395 nm under identical conditions. The dataset includes the values of the following chemical parameters for each olive oil sample: acidity, peroxide value, K270, K232, ethyl esters, and the quality of the samples (EVOO, VOO, or LOO). The dataset offers a unique possibility for researchers in food technology to develop machine learning models based on fluorescence data for the quality assessment of olive oil due to the availability of both spectroscopic and chemical data. The dataset can be used, for example, to predict one or multiple chemical parameters or to classify samples based on their quality from fluorescence spectra.
Abstract:The olive oil sector produces a substantial impact in the Mediterranean's economy and lifestyle. Many studies exist which try to optimize the different steps in the olive oil's production process. One of the main challenges for olive oil producers is the ability to asses and control the quality during the production cycle. For this purpose, several parameters need to be determined, such as the acidity, the UV absorption or the ethyl esters content. To achieve this, samples must be sent to an approved laboratory for chemical analysis. This approach is expensive and cannot be performed very frequently, making quality control of olive oil a real challenge. This work explores a new approach based on fluorescence spectroscopy and artificial intelligence (namely, 1-D convolutional neural networks) to predict the five chemical quality indicators of olive oil (acidity, peroxide value, UV spectroscopic parameters $K_{270}$ and $K_{232}$, and ethyl esters) from simple fluorescence spectra. Fluorescence spectroscopy is a very attractive optical technique since it does not require sample preparation, is non destructive, and, as shown in this work, can be easily implemented in small and cost-effective sensors. The results indicate that the proposed approach gives exceptional results in the quality determination and would make the continuous quality control of olive oil during and after the production process a reality. Additionally, this novel methodology presents potential applications as a support for quality specifications of olive oil, as defined by the European regulation.
Abstract:This paper presents the intrinsic limit determination algorithm (ILD Algorithm), a novel technique to determine the best possible performance, measured in terms of the AUC (area under the ROC curve) and accuracy, that can be obtained from a specific dataset in a binary classification problem with categorical features {\sl regardless} of the model used. This limit, namely the Bayes error, is completely independent of any model used and describes an intrinsic property of the dataset. The ILD algorithm thus provides important information regarding the prediction limits of any binary classification algorithm when applied to the considered dataset. In this paper the algorithm is described in detail, its entire mathematical framework is presented and the pseudocode is given to facilitate its implementation. Finally, an example with a real dataset is given.