Abstract:Recent advances in deep learning have shown significant potential for solving combinatorial optimization problems in real-time. Unlike traditional methods, deep learning can generate high-quality solutions efficiently, which is crucial for applications like routing and scheduling. However, existing approaches like deep reinforcement learning (RL) and behavioral cloning have notable limitations, with deep RL suffering from slow learning and behavioral cloning relying solely on expert actions, which can lead to generalization issues and neglect of the optimization objective. This paper introduces a novel offline RL method designed for combinatorial optimization problems with complex constraints, where the state is represented as a heterogeneous graph and the action space is variable. Our approach encodes actions in edge attributes and balances expected rewards with the imitation of expert solutions. We demonstrate the effectiveness of this method on job-shop scheduling and flexible job-shop scheduling benchmarks, achieving superior performance compared to state-of-the-art techniques.
Abstract:In cognitive neuroscience and brain-computer interface research, accurately predicting imagined stimuli is crucial. This study investigates the effectiveness of Domain Adaptation (DA) in enhancing imagery prediction using primarily visual data from fMRI scans of 18 subjects. Initially, we train a baseline model on visual stimuli to predict imagined stimuli, utilizing data from 14 brain regions. We then develop several models to improve imagery prediction, comparing different DA methods. Our results demonstrate that DA significantly enhances imagery prediction, especially with the Regular Transfer approach. We then conduct a DA-enhanced searchlight analysis using Regular Transfer, followed by permutation-based statistical tests to identify brain regions where imagery decoding is consistently above chance across subjects. Our DA-enhanced searchlight predicts imagery contents in a highly distributed set of brain regions, including the visual cortex and the frontoparietal cortex, thereby outperforming standard cross-domain classification methods. The complete code and data for this paper have been made openly available for the use of the scientific community.
Abstract:In this paper we introduce a word embedding composition method based on the intuitive idea that a fair embedding representation for a given set of words should satisfy that the new vector will be at the same distance of the vector representation of each of its constituents, and this distance should be minimized. The embedding composition method can work with static and contextualized word representations, it can be applied to create representations of sentences and learn also representations of sets of words that are not necessarily organized as a sequence. We theoretically characterize the conditions for the existence of this type of representation and derive the solution. We evaluate the method in data augmentation and sentence classification tasks, investigating several design choices of embeddings and composition methods. We show that our approach excels in solving probing tasks designed to capture simple linguistic features of sentences.
Abstract:The use of machine learning algorithms to investigate phase transitions in physical systems is a valuable way to better understand the characteristics of these systems. Neural networks have been used to extract information of phases and phase transitions directly from many-body configurations. However, one limitation of neural networks is that they require the definition of the model architecture and parameters previous to their application, and such determination is itself a difficult problem. In this paper, we investigate for the first time the relationship between the accuracy of neural networks for information of phases and the network configuration (that comprises the architecture and hyperparameters). We formulate the phase analysis as a regression task, address the question of generating data that reflects the different states of the physical system, and evaluate the performance of neural architecture search for this task. After obtaining the optimized architectures, we further implement smart data processing and analytics by means of neuron coverage metrics, assessing the capability of these metrics to estimate phase transitions. Our results identify the neuron coverage metric as promising for detecting phase transitions in physical systems.
Abstract:The lack of transparency of Deep Neural Networks continues to be a limitation that severely undermines their reliability and usage in high-stakes applications. Promising approaches to overcome such limitations are Prototype-Based Self-Explainable Neural Networks (PSENNs), whose predictions rely on the similarity between the input at hand and a set of prototypical representations of the output classes, offering therefore a deep, yet transparent-by-design, architecture. So far, such models have been designed by considering pointwise estimates for the prototypes, which remain fixed after the learning phase of the model. In this paper, we introduce a probabilistic reformulation of PSENNs, called Prob-PSENN, which replaces point estimates for the prototypes with probability distributions over their values. This provides not only a more flexible framework for an end-to-end learning of prototypes, but can also capture the explanatory uncertainty of the model, which is a missing feature in previous approaches. In addition, since the prototypes determine both the explanation and the prediction, Prob-PSENNs allow us to detect when the model is making uninformed or uncertain predictions, and to obtain valid explanations for them. Our experiments demonstrate that Prob-PSENNs provide more meaningful and robust explanations than their non-probabilistic counterparts, thus enhancing the explainability and reliability of the models.
Abstract:Recent advancements in the flexible job-shop scheduling problem (FJSSP) are primarily based on deep reinforcement learning (DRL) due to its ability to generate high-quality, real-time solutions. However, DRL approaches often fail to fully harness the strengths of existing techniques such as exact methods or constraint programming (CP), which can excel at finding optimal or near-optimal solutions for smaller instances. This paper aims to integrate CP within a deep learning (DL) based methodology, leveraging the benefits of both. In this paper, we introduce a method that involves training a DL model using optimal solutions generated by CP, ensuring the model learns from high-quality data, thereby eliminating the need for the extensive exploration typical in DRL and enhancing overall performance. Further, we integrate CP into our DL framework to jointly construct solutions, utilizing DL for the initial complex stages and transitioning to CP for optimal resolution as the problem is simplified. Our hybrid approach has been extensively tested on three public FJSSP benchmarks, demonstrating superior performance over five state-of-the-art DRL approaches and a widely-used CP solver. Additionally, with the objective of exploring the application to other combinatorial optimization problems, promising preliminary results are presented on applying our hybrid approach to the traveling salesman problem, combining an exact method with a well-known DRL method.
Abstract:The Flexible Job Shop Scheduling Problem (FJSSP) has been extensively studied in the literature, and multiple approaches have been proposed within the heuristic, exact, and metaheuristic methods. However, the industry's demand to be able to respond in real-time to disruptive events has generated the necessity to be able to generate new schedules within a few seconds. Among these methods, under this constraint, only dispatching rules (DRs) are capable of generating schedules, even though their quality can be improved. To improve the results, recent methods have been proposed for modeling the FJSSP as a Markov Decision Process (MDP) and employing reinforcement learning to create a policy that generates an optimal solution assigning operations to machines. Nonetheless, there is still room for improvement, particularly in the larger FJSSP instances which are common in real-world scenarios. Therefore, the objective of this paper is to propose a method capable of robustly solving large instances of the FJSSP. To achieve this, we propose a novel way of modeling the FJSSP as an MDP using graph neural networks. We also present two methods to make inference more robust: generating a diverse set of scheduling policies that can be parallelized and limiting them using DRs. We have tested our approach on synthetically generated instances and various public benchmarks and found that our approach outperforms dispatching rules and achieves better results than three other recent deep reinforcement learning methods on larger FJSSP instances.
Abstract:In some machine learning applications the availability of labeled instances for supervised classification is limited while unlabeled instances are abundant. Semi-supervised learning algorithms deal with these scenarios and attempt to exploit the information contained in the unlabeled examples. In this paper, we address the question of how to evolve neural networks for semi-supervised problems. We introduce neuroevolutionary approaches that exploit unlabeled instances by using neuron coverage metrics computed on the neural network architecture encoded by each candidate solution. Neuron coverage metrics resemble code coverage metrics used to test software, but are oriented to quantify how the different neural network components are covered by test instances. In our neuroevolutionary approach, we define fitness functions that combine classification accuracy computed on labeled examples and neuron coverage metrics evaluated using unlabeled examples. We assess the impact of these functions on semi-supervised problems with a varying amount of labeled instances. Our results show that the use of neuron coverage metrics helps neuroevolution to become less sensitive to the scarcity of labeled data, and can lead in some cases to a more robust generalization of the learned classifiers.
Abstract:Physics-Informed Neural Networks (PINNs) are Neural Network architectures trained to emulate solutions of differential equations without the necessity of solution data. They are currently ubiquitous in the scientific literature due to their flexible and promising settings. However, very little of the available research provides practical studies that aim for a better quantitative understanding of such architecture and its functioning. In this paper, we analyze the performance of PINNs for various architectural hyperparameters and algorithmic settings based on a novel error metric and other factors such as training time. The proposed metric and approach are tailored to evaluate how well a PINN generalizes to points outside its training domain. Besides, we investigate the effect of the algorithmic setup on the outcome prediction of a PINN, inside and outside its training domain, to explore the effect of each hyperparameter. Through our study, we assess how the algorithmic setup of PINNs influences their potential for generalization and deduce the settings which maximize the potential of a PINN for accurate generalization. The study that we present returns insightful and at times counterintuitive results on PINNs. These results can be useful in PINN applications when defining the model and evaluating it.
Abstract:Reliable deployment of machine learning models such as neural networks continues to be challenging due to several limitations. Some of the main shortcomings are the lack of interpretability and the lack of robustness against adversarial examples or out-of-distribution inputs. In this paper, we explore the possibilities and limits of adversarial attacks for explainable machine learning models. First, we extend the notion of adversarial examples to fit in explainable machine learning scenarios, in which the inputs, the output classifications and the explanations of the model's decisions are assessed by humans. Next, we propose a comprehensive framework to study whether (and how) adversarial examples can be generated for explainable models under human assessment, introducing novel attack paradigms. In particular, our framework considers a wide range of relevant (yet often ignored) factors such as the type of problem, the user expertise or the objective of the explanations in order to identify the attack strategies that should be adopted in each scenario to successfully deceive the model (and the human). These contributions intend to serve as a basis for a more rigorous and realistic study of adversarial examples in the field of explainable machine learning.