Abstract:Natural language is often the easiest and most convenient modality for humans to specify tasks for robots. However, learning to ground language to behavior typically requires impractical amounts of diverse, language-annotated demonstrations collected on each target robot. In this work, we aim to separate the problem of what to accomplish from how to accomplish it, as the former can benefit from substantial amounts of external observation-only data, and only the latter depends on a specific robot embodiment. To this end, we propose Video-Language Critic, a reward model that can be trained on readily available cross-embodiment data using contrastive learning and a temporal ranking objective, and use it to score behavior traces from a separate reinforcement learning actor. When trained on Open X-Embodiment data, our reward model enables 2x more sample-efficient policy training on Meta-World tasks than a sparse reward only, despite a significant domain gap. Using in-domain data but in a challenging task generalization setting on Meta-World, we further demonstrate more sample-efficient training than is possible with prior language-conditioned reward models that are either trained with binary classification, use static images, or do not leverage the temporal information present in video data.
Abstract:The non-uniform photoelectric response of infrared imaging systems results in fixed-pattern stripe noise being superimposed on infrared images, which severely reduces image quality. As the applications of degraded infrared images are limited, it is crucial to effectively preserve original details. Existing image destriping methods struggle to concurrently remove all stripe noise artifacts, preserve image details and structures, and balance real-time performance. In this paper we propose a novel algorithm for destriping degraded images, which takes advantage of neighbouring column signal correlation to remove independent column stripe noise. This is achieved through an iterative deep unfolding algorithm where the estimated noise of one network iteration is used as input to the next iteration. This progression substantially reduces the search space of possible function approximations, allowing for efficient training on larger datasets. The proposed method allows for a more precise estimation of stripe noise to preserve scene details more accurately. Extensive experimental results demonstrate that the proposed model outperforms existing destriping methods on artificially corrupted images on both quantitative and qualitative assessments.
Abstract:We develop a hybrid model-based data-driven seizure detection algorithm called Mutual Information-based CNNAided Learned factor graphs (MICAL) for detection of eclectic seizures from EEG signals. Our proposed method contains three main components: a neural mutual information (MI) estimator, 1D convolutional neural network (CNN), and factor graph inference. Since during seizure the electrical activity in one or more regions in the brain becomes correlated, we use neural MI estimators to measure inter-channel statistical dependence. We also design a 1D CNN to extract additional features from raw EEG signals. Since the soft estimates obtained as the combined features from the neural MI estimator and the CNN do not capture the temporal correlation between different EEG blocks, we use them not as estimates of the seizure state, but to compute the function nodes of a factor graph. The resulting factor graphs allows structured inference which exploits the temporal correlation for further improving the detection performance. On public CHB-MIT database, We conduct three evaluation approaches using the public CHB-MIT database, including 6-fold leave-four-patients-out cross-validation, all patient training; and per patient training. Our evaluations systematically demonstrate the impact of each element in MICAL through a complete ablation study and measuring six performance metrics. It is shown that the proposed method obtains state-of-the-art performance specifically in 6-fold leave-four-patients-out cross-validation and all patient training, demonstrating a superior generalizability.
Abstract:We propose a convolutional neural network (CNN) aided factor graphs assisted by mutual information features estimated by a neural network for seizure detection. Specifically, we use neural mutual information estimation to evaluate the correlation between different electroencephalogram (EEG) channels as features. We then use a 1D-CNN to extract extra features from the EEG signals and use both features to estimate the probability of a seizure event.~Finally, learned factor graphs are employed to capture the temporal correlation in the signal. Both sets of features from the neural mutual estimation and the 1D-CNN are used to learn the factor nodes. We show that the proposed method achieves state-of-the-art performance using 6-fold leave-four-patients-out cross-validation.
Abstract:Recently, several methods have been proposed for estimating the mutual information from sample data using deep neural networks and without the knowing closed form distribution of the data. This class of estimators is referred to as neural mutual information estimators. Although very promising, such techniques have yet to be rigorously bench-marked so as to establish their efficacy, ease of implementation, and stability for capacity estimation which is joint maximization frame-work. In this paper, we compare the different techniques proposed in the literature for estimating capacity and provide a practitioner perspective on their effectiveness. In particular, we study the performance of mutual information neural estimator (MINE), smoothed mutual information lower-bound estimator (SMILE), and directed information neural estimator (DINE) and provide insights on InfoNCE. We evaluated these algorithms in terms of their ability to learn the input distributions that are capacity approaching for the AWGN channel, the optical intensity channel, and peak power-constrained AWGN channel. For both scenarios, we provide insightful comments on various aspects of the training process, such as stability, sensitivity to initialization.
Abstract:We propose a computationally efficient algorithm for seizure detection. Instead of using a purely data-driven approach, we develop a hybrid model-based/data-driven method, combining convolutional neural networks with factor graph inference. On the CHB-MIT dataset, we demonstrate that the proposed method can generalize well in a 6 fold leave-4-patientout evaluation. Moreover, it is shown that our algorithm can achieve as much as 5% absolute improvement in performance compared to previous data-driven methods. This is achieved while the computational complexity of the proposed technique is a fraction of the complexity of prior work, making it suitable for real-time seizure detection.
Abstract:We present an introduction to model-based machine learning for communication systems. We begin by reviewing existing strategies for combining model-based algorithms and machine learning from a high level perspective, and compare them to the conventional deep learning approach which utilizes established deep neural network (DNN) architectures trained in an end-to-end manner. Then, we focus on symbol detection, which is one of the fundamental tasks of communication receivers. We show how the different strategies of conventional deep architectures, deep unfolding, and DNN-aided hybrid algorithms, can be applied to this problem. The last two approaches constitute a middle ground between purely model-based and solely DNN-based receivers. By focusing on this specific task, we highlight the advantages and drawbacks of each strategy, and present guidelines to facilitate the design of future model-based deep learning systems for communications.
Abstract:The design of methods for inference from time sequences has traditionally relied on statistical models that describe the relation between a latent desired sequence and the observed one. A broad family of model-based algorithms have been derived to carry out inference at controllable complexity using recursive computations over the factor graph representing the underlying distribution. An alternative model-agnostic approach utilizes machine learning (ML) methods. Here we propose a framework that combines model-based inference algorithms and data-driven ML tools for stationary time sequences. In the proposed approach, neural networks are developed to separately learn specific components of a factor graph describing the distribution of the time sequence, rather than the complete inference task. By exploiting stationary properties of this distribution, the resulting approach can be applied to sequences of varying temporal duration. Additionally, this approach facilitates the use of compact neural networks which can be trained with small training sets, or alternatively, can be used to improve upon existing deep inference systems. We present an inference algorithm based on learned stationary factor graphs, referred to as StaSPNet, which learns to implement the sum product scheme from labeled data, and can be applied to sequences of different lengths. Our experimental results demonstrate the ability of the proposed StaSPNet to learn to carry out accurate inference from small training sets for sleep stage detection using the Sleep-EDF dataset, as well as for symbol detection in digital communications with unknown channels.
Abstract:The design of symbol detectors in digital communication systems has traditionally relied on statistical channel models that describe the relation between the transmitted symbols and the observed signal at the receiver. Here we review a data-driven framework to symbol detection design which combines machine learning (ML) and model-based algorithms. In this hybrid approach, well-known channel-model-based algorithms such as the Viterbi method, BCJR detection, and multiple-input multiple-output (MIMO) soft interference cancellation (SIC) are augmented with ML-based algorithms to remove their channel-model-dependence, allowing the receiver to learn to implement these algorithms solely from data. The resulting data-driven receivers are most suitable for systems where the underlying channel models are poorly understood, highly complex, or do not well-capture the underlying physics. Our approach is unique in that it only replaces the channel-model-based computations with dedicated neural networks that can be trained from a small amount of data, while keeping the general algorithm intact. Our results demonstrate that these techniques can yield near-optimal performance of model-based algorithms without knowing the exact channel input-output statistical relationship and in the presence of channel state information uncertainty.
Abstract:Many important schemes in signal processing and communications, ranging from the BCJR algorithm to the Kalman filter, are instances of factor graph methods. This family of algorithms is based on recursive message passing-based computations carried out over graphical models, representing a factorization of the underlying statistics. Consequently, in order to implement these algorithms, one must have accurate knowledge of the statistical model of the considered signals. In this work we propose to implement factor graph methods in a data-driven manner. In particular, we propose to use machine learning (ML) tools to learn the factor graph, instead of the overall system task, which in turn is used for inference by message passing over the learned graph. We apply the proposed approach to learn the factor graph representing a finite-memory channel, demonstrating the resulting ability to implement BCJR detection in a data-driven fashion. We demonstrate that the proposed system, referred to as BCJRNet, learns to implement the BCJR algorithm from a small training set, and that the resulting receiver exhibits improved robustness to inaccurate training compared to the conventional channel-model-based receiver operating under the same level of uncertainty. Our results indicate that by utilizing ML tools to learn factor graphs from labeled data, one can implement a broad range of model-based algorithms, which traditionally require full knowledge of the underlying statistics, in a data-driven fashion.