Abstract:Diarrhetic Shellfish Poisoning (DSP) is a global health threat arising from shellfish contaminated with toxins produced by dinoflagellates. The condition, with its widespread incidence, high morbidity rate, and persistent shellfish toxicity, poses risks to public health and the shellfish industry. High biomass of toxin-producing algae such as DSP are known as Harmful Algal Blooms (HABs). Monitoring and forecasting systems are crucial for mitigating HABs impact. Predicting harmful algal blooms involves a time-series-based problem with a strong historical seasonal component, however, recent anomalies due to changes in meteorological and oceanographic events have been observed. Stream Learning stands out as one of the most promising approaches for addressing time-series-based problems with concept drifts. However, its efficacy in predicting HABs remains unproven and needs to be tested in comparison with Batch Learning. Historical data availability is a critical point in developing predictive systems. In oceanography, the available data collection can have some constrains and limitations, which has led to exploring new tools to obtain more exhaustive time series. In this study, a machine learning workflow for predicting the number of cells of a toxic dinoflagellate, Dinophysis acuminata, was developed with several key advancements. Seven machine learning algorithms were compared within two learning paradigms. Notably, the output data from CROCO, the ocean hydrodynamic model, was employed as the primary dataset, palliating the limitation of time-continuous historical data. This study highlights the value of models interpretability, fair models comparison methodology, and the incorporation of Stream Learning models. The model DoME, with an average R2 of 0.77 in the 3-day-ahead prediction, emerged as the most effective and interpretable predictor, outperforming the other algorithms.
Abstract:Harmful algal blooms (HABs) are episodes of high concentrations of algae that are potentially toxic for human consumption. Mollusc farming can be affected by HABs because, as filter feeders, they can accumulate high concentrations of marine biotoxins in their tissues. To avoid the risk to human consumption, harvesting is prohibited when toxicity is detected. At present, the closure of production areas is based on expert knowledge and the existence of a predictive model would help when conditions are complex and sampling is not possible. Although the concentration of toxin in meat is the method most commonly used by experts in the control of shellfish production areas, it is rarely used as a target by automatic prediction models. This is largely due to the irregularity of the data due to the established sampling programs. As an alternative, the activity status of production areas has been proposed as a target variable based on whether mollusc meat has a toxicity level below or above the legal limit. This new option is the most similar to the actual functioning of the control of shellfish production areas. For this purpose, we have made a comparison between hybrid machine learning models like Neural-Network-Adding Bootstrap (BAGNET) and Discriminative Nearest Neighbor Classification (SVM-KNN) when estimating the state of production areas. The study has been carried out in several estuaries with different levels of complexity in the episodes of algal blooms to demonstrate the generalization capacity of the models in bloom detection. As a result, we could observe that, with an average recall value of 93.41% and without dropping below 90% in any of the estuaries, BAGNET outperforms the other models both in terms of results and robustness.
Abstract:Mussel farming is one of the most important aquaculture industries. The main risk to mussel farming is harmful algal blooms (HABs), which pose a risk to human consumption. In Galicia, the Spanish main producer of cultivated mussels, the opening and closing of the production areas is controlled by a monitoring program. In addition to the closures resulting from the presence of toxicity exceeding the legal threshold, in the absence of a confirmatory sampling and the existence of risk factors, precautionary closures may be applied. These decisions are made by experts without the support or formalisation of the experience on which they are based. Therefore, this work proposes a predictive model capable of supporting the application of precautionary closures. Achieving sensitivity, accuracy and kappa index values of 97.34%, 91.83% and 0.75 respectively, the kNN algorithm has provided the best results. This allows the creation of a system capable of helping in complex situations where forecast errors are more common.
Abstract:Path Planning methods for autonomous control of Unmanned Aerial Vehicle (UAV) swarms are on the rise because of all the advantages they bring. There are more and more scenarios where autonomous control of multiple UAVs is required. Most of these scenarios present a large number of obstacles, such as power lines or trees. If all UAVs can be operated autonomously, personnel expenses can be decreased. In addition, if their flight paths are optimal, energy consumption is reduced. This ensures that more battery time is left for other operations. In this paper, a Reinforcement Learning based system is proposed for solving this problem in environments with obstacles by making use of Q-Learning. This method allows a model, in this particular case an Artificial Neural Network, to self-adjust by learning from its mistakes and achievements. Regardless of the size of the map or the number of UAVs in the swarm, the goal of these paths is to ensure complete coverage of an area with fixed obstacles for tasks, like field prospecting. Setting goals or having any prior information aside from the provided map is not required. For experimentation, five maps of different sizes with different obstacles were used. The experiments were performed with different number of UAVs. For the calculation of the results, the number of actions taken by all UAVs to complete the task in each experiment is taken into account. The lower the number of actions, the shorter the path and the lower the energy consumption. The results are satisfactory, showing that the system obtains solutions in fewer movements the more UAVs there are. For a better presentation, these results have been compared to another state-of-the-art approach.
Abstract:Over the years, several approaches have tried to tackle the problem of performing an automatic scoring of the sleeping stages. Although any polysomnography usually collects over a dozen of different signals, this particular problem has been mainly tackled by using only the Electroencephalograms presented in those records. On the other hand, the other recorded signals have been mainly ignored by most works. This paper explores and compares the convenience of using additional signals apart from electroencephalograms. More specifically, this work uses the SHHS-1 dataset with 5,804 patients containing an electromyogram recorded simultaneously as two electroencephalograms. To compare the results, first, the same architecture has been evaluated with different input signals and all their possible combinations. These tests show how, using more than one signal especially if they are from different sources, improves the results of the classification. Additionally, the best models obtained for each combination of one or more signals have been used in ensemble models and, its performance has been compared showing the convenience of using these multi-signal models to improve the classification. The best overall model, an ensemble of Depth-wise Separational Convolutional Neural Networks, has achieved an accuracy of 86.06\% with a Cohen's Kappa of 0.80 and a $F_{1}$ of 0.77. Up to date, those are the best results on the complete dataset and it shows a significant improvement in the precision and recall for the most uncommon class in the dataset.
Abstract:Sleeping problems have become one of the major diseases all over the world. To tackle this issue, the basic tool used by specialists is the Polysomnogram, which is a collection of different signals recorded during sleep. After its recording, the specialists have to score the different signals according to one of the standard guidelines. This process is carried out manually, which can be highly time-consuming and very prone to annotation errors. Therefore, over the years, many approaches have been explored in an attempt to support the specialists in this task. In this paper, an approach based on convolutional neural networks is presented, where an in-depth comparison is performed in order to determine the convenience of using more than one signal simultaneously as input. Additionally, the models were also used as parts of an ensemble model to check whether any useful information can be extracted from signal processing a single signal at a time which the dual-signal model cannot identify. Tests have been performed by using a well-known dataset called expanded sleep-EDF, which is the most commonly used dataset as the benchmark for this problem. The tests were carried out with a leave-one-out cross-validation over the patients, which ensures that there is no possible contamination between training and testing. The resulting proposal is a network smaller than previously published ones, but which overcomes the results of any previous models on the same dataset. The best result shows an accuracy of 92.67\% and a Cohen's Kappa value over 0.84 compared to human experts.
Abstract:This paper describes a new method for Symbolic Regression that allows to find mathematical expressions from a dataset. This method has a strong mathematical basis. As opposed to other methods such as Genetic Programming, this method is deterministic, and does not involve the creation of a population of initial solutions. Instead of it, a simple expression is being grown until it fits the data. The experiments performed show that the results are as good as other Machine Learning methods, in a very low computational time. Another advantage of this technique is that the complexity of the expressions can be limited, so the system can return mathematical expressions that can be easily analysed by the user, in opposition to other techniques like GSGP.
Abstract:This paper proposes a new model for music prediction based on Variational Autoencoders (VAEs). In this work, VAEs are used in a novel way in order to address two different problems: music representation into the latent space, and using this representation to make predictions of the future values of the musical piece. This approach was trained with different songs of a classical composer. As a result, the system can represent the music in the latent space, and make accurate predictions. Therefore, the system can be used to compose new music either from an existing piece or from a random starting point. An additional feature of this system is that a small dataset was used for training. However, results show that the system is able to return accurate representations and predictions in unseen data.
Abstract:Traditionally, signal classification is a process in which previous knowledge of the signals is needed. Human experts decide which features are extracted from the signals, and used as inputs to the classification system. This requirement can make significant unknown information of the signal be missed by the experts and not be included in the features. This paper proposes a new method that automatically analyses the signals and extracts the features without any human participation. Therefore, there is no need for previous knowledge about the signals to be classified. The proposed method is based on Genetic Programming and, in order to test this method, it has been applied to a well-known EEG database related to epilepsy, a disease suffered by millions of people. As the results section shows, high accuracies in classification are obtained
Abstract:Data mining and data classification over biomedical data are two of the most important research fields in computer science. Among the great diversity of techniques that can be used for this purpose, Artifical Neural Networks (ANNs) is one of the most suited. One of the main problems in the development of this technique is the slow performance of the full process. Traditionally, in this development process, human experts are needed to experiment with different architectural procedures until they find the one that presents the correct results for solving a specific problem. However, many studies have emerged in which different ANN developmental techniques, more or less automated, are described. In this paper, the authors have focused on developing a new technique to perform this process over biomedical data. The new technique is described in which two Evolutionary Computation (EC) techniques are mixed to automatically develop ANNs. These techniques are Genetic Algorithms and Genetic Programming. The work goes further, and the system described here allows to obtain simplified networks with a low number of neurons to resolve the problems. The system is compared with the already existent system which also uses EC over a set of well-known problems. The conclusions reached from these comparisons indicate that this new system produces very good results, which in the worst case are at least comparable to existing techniques and in many cases are substantially better.