Abstract:The existence of representative datasets is a prerequisite of many successful artificial intelligence and machine learning models. However, the subsequent application of these models often involves scenarios that are inadequately represented in the data used for training. The reasons for this are manifold and range from time and cost constraints to ethical considerations. As a consequence, the reliable use of these models, especially in safety-critical applications, is a huge challenge. Leveraging additional, already existing sources of knowledge is key to overcome the limitations of purely data-driven approaches, and eventually to increase the generalization capability of these models. Furthermore, predictions that conform with knowledge are crucial for making trustworthy and safe decisions even in underrepresented scenarios. This work provides an overview of existing techniques and methods in the literature that combine data-based models with existing knowledge. The identified approaches are structured according to the categories integration, extraction and conformity. Special attention is given to applications in the field of autonomous driving.
Abstract:Distance-based dynamic texture recognition is an important research field in multimedia processing with applications ranging from retrieval to segmentation of video data. Based on the conjecture that the most distinctive characteristic of a dynamic texture is the appearance of its individual frames, this work proposes to describe dynamic textures as kernelized spaces of frame-wise feature vectors computed using the Scattering transform. By combining these spaces with a basis-invariant metric, we get a framework that produces competitive results for nearest neighbor classification and state-of-the-art results for nearest class center classification.
Abstract:Research in machine learning is at a turning point. While supervised deep learning has conquered the field at a breathtaking pace and demonstrated the ability to solve inference problems with unprecedented accuracy, it still does not quite live up to its name if we think of learning as the process of acquiring knowledge about a subject or problem. Major weaknesses of present-day deep learning models are, for instance, their lack of adaptability to changes of environment or their incapability to perform other kinds of tasks than the one they were trained for. While it is still unclear how to overcome these limitations, one can observe a paradigm shift within the machine learning community, with research interests shifting away from increasing the performance of highly parameterized models to exceedingly specific tasks, and towards employing machine learning algorithms in highly diverse domains. This research question can be approached from different angles. For instance, the field of Informed AI investigates the problem of infusing domain knowledge into a machine learning model, by using techniques such as regularization, data augmentation or post-processing. On the other hand, a remarkable number of works in the recent years has focused on developing models that by themselves guarantee a certain degree of versatility and invariance with respect to the domain or problem at hand. Thus, rather than investigating how to provide domain-specific knowledge to machine learning models, these works explore methods that equip the models with the capability of acquiring the knowledge by themselves. This white paper provides an introduction and discussion of this emerging field in machine learning research. To this end, it reviews the role of knowledge in machine learning, and discusses its relation to the concept of invariance, before providing a literature review of the field.
Abstract:This work studies the problem of modeling non-linear visual processes by learning linear generative models from observed sequences. We propose a joint learning framework, combining a Linear Dynamic System and a Variational Autoencoder with convolutional layers. After discussing several conditions for linearizing neural networks, we propose an architecture that allows Variational Autoencoders to simultaneously learn the non-linear observation as well as the linear state-transition from a sequence of observed frames. The proposed framework is demonstrated experimentally in three series of synthesis experiments.
Abstract:Recent research in image and video recognition indicates that many visual processes can be thought of as being generated by a time-varying generative model. A nearby descriptive model for visual processes is thus a statistical distribution that varies over time. Specifically, modeling visual processes as streams of histograms generated by a kernelized linear dynamic system turns out to be efficient. We refer to such a model as a System of Bags. In this work, we investigate Systems of Bags with special emphasis on dynamic scenes and dynamic textures. Parameters of linear dynamic systems suffer from ambiguities. In order to cope with these ambiguities in the kernelized setting, we develop a kernelized version of the alignment distance. For its computation, we use a Jacobi-type method and prove its convergence to a set of critical points. We employ it as a dissimilarity measure on Systems of Bags. As such, it outperforms other known dissimilarity measures for kernelized linear dynamic systems, in particular the Martin Distance and the Maximum Singular Value Distance, in every tested classification setting. A considerable margin can be observed in settings, where classification is performed with respect to an abstract mean of video sets. For this scenario, the presented approach can outperform state-of-the-art techniques, such as Dynamic Fractal Spectrum or Orthogonal Tensor Dictionary Learning.
Abstract:This work studies the problem of content-based image retrieval, specifically, texture retrieval. It focuses on feature extraction and similarity measure for texture images. Our approach employs a recently developed method, the so-called Scattering transform, for the process of feature extraction in texture retrieval. It shares a distinctive property of providing a robust representation, which is stable with respect to spatial deformations. Recent work has demonstrated its capability for texture classification, and hence as a promising candidate for the problem of texture retrieval. Moreover, we adopt a common approach of measuring the similarity of textures by comparing the subband histograms of a filterbank transform. To this end we derive a similarity measure based on the popular Bhattacharyya Kernel. Despite the popularity of describing histograms using parametrized probability density functions, such as the Generalized Gaussian Distribution, it is unfortunately not applicable for describing most of the Scattering transform subbands, due to the complex modulus performed on each one of them. In this work, we propose to use the Weibull distribution to model the Scattering subbands of descendant layers. Our numerical experiments demonstrated the effectiveness of the proposed approach, in comparison with several state of the arts.