Abstract:Wearable devices for seizure monitoring detection could significantly improve the quality of life of epileptic patients. However, existing solutions that mostly rely on full electrode set of electroencephalogram (EEG) measurements could be inconvenient for every day use. In this paper, we propose a novel knowledge distillation approach to transfer the knowledge from a sophisticated seizure detector (called the teacher) trained on data from the full set of electrodes to learn new detectors (called the student). They are both providing lightweight implementations and significantly reducing the number of electrodes needed for recording the EEG. We consider the case where the teacher and the student seizure detectors are graph neural networks (GNN), since these architectures actively use the connectivity information. We consider two cases (a) when a single student is learnt for all the patients using preselected channels; and (b) when personalized students are learnt for every individual patient, with personalized channel selection using a Gumbelsoftmax approach. Our experiments on the publicly available Temple University Hospital EEG Seizure Data Corpus (TUSZ) show that both knowledge-distillation and personalization play significant roles in improving performance of seizure detection, particularly for patients with scarce EEG data. We observe that using as few as two channels, we are able to obtain competitive seizure detection performance. This, in turn, shows the potential of our approach in more realistic scenario of wearable devices for personalized monitoring of seizures, even with few recordings.
Abstract:In this paper, we propose a personalized seizure detection and classification framework that quickly adapts to a specific patient from limited seizure samples. We achieve this by combining two novel paradigms that have recently seen much success in a wide variety of real-world applications: graph neural networks (GNN), and meta-learning. We train a Meta-GNN based classifier that learns a global model from a set of training patients such that this global model can eventually be adapted to a new unseen patient using very limited samples. We apply our approach on the TUSZ-dataset, one of the largest and publicly available benchmark datasets for epilepsy. We show that our method outperforms the baselines by reaching 82.7% on accuracy and 82.08% on F1 score after only 20 iterations on new unseen patients.
Abstract:Meta-learning refers to the process of abstracting a learning rule for a class of tasks through a meta-parameter that captures the inductive bias for the class. The metaparameter is used to achieve a fast adaptation to unseen tasks from the class, given a few training samples. While meta-learning implicitly assumes the tasks as being similar, it is generally unclear how this similarity could be quantified. Further, many of the popular meta-learning approaches do not actively use such a task-similarity in solving for the tasks. In this paper, we propose the task-similarity aware nonparameteric meta-learning algorithm that explicitly employs similarity/dissimilarity between tasks using nonparametric kernel regression. Our approach models the task-specific parameters to lie in a reproducing kernel Hilbert space, wherein the kernel function captures the similarity across tasks. The proposed algorithm iteratively learns a meta-parameter which is used to assign a task-specific descriptor for every task. The task descriptors are then used to quantify the similarity through the kernel function. We show how our approach generalizes the popular meta-learning approaches of model-agnostic meta-learning (MAML) and Meta-stochastic gradient descent (Meta-SGD) approaches. Numerical experiments with regression tasks show that our algorithm performs well even in the presence of outlier or dissimilar tasks, validating the proposed approach
Abstract:We provide a predictive analysis of the spread of COVID-19, also known as SARS-CoV-2, using the dataset made publicly available online by the Johns Hopkins University. Our main objective is to provide predictions of the number of infected people for different countries in the next 14 days. The predictive analysis is done using time-series data transformed on a logarithmic scale. We use two well-known methods for prediction: polynomial regression and neural network. As the number of training data for each country is limited, we use a single-layer neural network called the extreme learning machine (ELM) to avoid over-fitting. Due to the non-stationary nature of the time-series, a sliding window approach is used to provide a more accurate prediction.
Abstract:The contribution of this paper is a framework for training and evaluation of Model Predictive Control (MPC) implemented using constrained neural networks. Recent studies have proposed to use neural networks with differentiable convex optimization layers to implement model predictive controllers. The motivation is to replace real-time optimization in safety critical feedback control systems with learnt mappings in the form of neural networks with optimization layers. Such mappings take as the input the state vector and predict the control law as the output. The learning takes place using training data generated from off-line MPC simulations. However, a general framework for characterization of learning approaches in terms of both model validation and efficient training data generation is lacking in literature. In this paper, we take the first steps towards developing such a coherent framework. We discuss how the learning problem has similarities with system identification, in particular input design, model structure selection and model validation. We consider the study of neural network architectures in PyTorch with the explicit MPC constraints implemented as a differentiable optimization layer using CVXPY. We propose an efficient approach of generating MPC input samples subject to the MPC model constraints using a hit-and-run sampler. The corresponding true outputs are generated by solving the MPC offline using OSOP. We propose different metrics to validate the resulting approaches. Our study further aims to explore the advantages of incorporating domain knowledge into the network structure from a training and evaluation perspective. Different model structures are numerically tested using the proposed framework in order to obtain more insights in the properties of constrained neural networks based MPC.
Abstract:We design a ReLU-based multilayer neural network to generate a rich high-dimensional feature vector. The feature guarantees a monotonically decreasing training cost as the number of layers increases. We design the weight matrix in each layer to extend the feature vectors to a higher dimensional space while providing a richer representation in the sense of training cost. Linear projection to the target in the higher dimensional space leads to a lower training cost if a convex cost is minimized. An $\ell_2$-norm convex constraint is used in the minimization to improve the generalization error and avoid overfitting. The regularization hyperparameters of the network are derived analytically to guarantee a monotonic decrement of the training cost and therefore, it eliminates the need for cross-validation to find the regularization hyperparameter in each layer.
Abstract:We address the issue of estimating the topology and dynamics of sparse linear dynamic networks in a hyperparameter-free setting. We propose a method to estimate the network dynamics in a computationally efficient and parameter tuning-free iterative framework known as SPICE (Sparse Iterative Covariance Estimation). The estimated dynamics directly reveal the underlying topology. Our approach does not assume that the network is undirected and is applicable even with varying noise levels across the modules of the network. We also do not assume any explicit prior knowledge on the network dynamics. Numerical experiments with realistic dynamic networks illustrate the usefulness of our method.
Abstract:Kernel and linear regression have been recently explored in the prediction of graph signals as the output, given arbitrary input signals that are agnostic to the graph. In many real-world problems, the graph expands over time as new nodes get introduced. Keeping this premise in mind, we propose a method to recursively obtain the optimal prediction or regression coefficients for the recently propose Linear Regression over Graphs (LRG), as the graph expands with incoming nodes. This comes as a natural consequence of the structure C(W)= of the regression problem, and obviates the need to solve a new regression problem each time a new node is added. Experiments with real-world graph signals show that our approach results in good prediction performance which tends to be close to that obtained from knowing the entire graph apriori.
Abstract:In presence of sparse noise we propose kernel regression for predicting output vectors which are smooth over a given graph. Sparse noise models the training outputs being corrupted either with missing samples or large perturbations. The presence of sparse noise is handled using appropriate use of $\ell_1$-norm along-with use of $\ell_2$-norm in a convex cost function. For optimization of the cost function, we propose an iteratively reweighted least-squares (IRLS) approach that is suitable for kernel substitution or kernel trick due to availability of a closed form solution. Simulations using real-world temperature data show efficacy of our proposed method, mainly for limited-size training datasets.
Abstract:We propose Gaussian processes for signals over graphs (GPG) using the apriori knowledge that the target vectors lie over a graph. We incorporate this information using a graph- Laplacian based regularization which enforces the target vectors to have a specific profile in terms of graph Fourier transform coeffcients, for example lowpass or bandpass graph signals. We discuss how the regularization affects the mean and the variance in the prediction output. In particular, we prove that the predictive variance of the GPG is strictly smaller than the conventional Gaussian process (GP) for any non-trivial graph. We validate our concepts by application to various real-world graph signals. Our experiments show that the performance of the GPG is superior to GP for small training data sizes and under noisy training.