Abstract:The following work addresses the problem of frameworks for data stream processing that can be used to evaluate the solutions in an environment that resembles real-world applications. The definition of structured frameworks stems from a need to reliably evaluate the data stream classification methods, considering the constraints of delayed and limited label access. The current experimental evaluation often boundlessly exploits the assumption of their complete and immediate access to monitor the recognition quality and to adapt the methods to the changing concepts. The problem is leveraged by reviewing currently described methods and techniques for data stream processing and verifying their outcomes in simulated environment. The effect of the work is a proposed taxonomy of data stream processing frameworks, showing the linkage between drift detection and classification methods considering a natural phenomenon of label delay.
Abstract:Practical applications of artificial intelligence increasingly often have to deal with the streaming properties of real data, which, considering the time factor, are subject to phenomena such as periodicity and more or less chaotic degeneration - resulting directly in the concept drifts. The modern concept drift detectors almost always assume immediate access to labels, which due to their cost, limited availability and possible delay has been shown to be unrealistic. This work proposes an unsupervised Parallel Activations Drift Detector, utilizing the outputs of an untrained neural network, presenting its key design elements, intuitions about processing properties, and a pool of computer experiments demonstrating its competitiveness with state-of-the-art methods.
Abstract:In recent years Deep Neural Network-based systems are not only increasing in popularity but also receive growing user trust. However, due to the closed-world assumption of such systems, they cannot recognize samples from unknown classes and often induce an incorrect label with high confidence. Presented work looks at the evaluation of methods for Open Set Recognition, focusing on the impact of class imbalance, especially in the dichotomy between known and unknown samples. As an outcome of problem analysis, we present a set of guidelines for evaluation of methods in this field.
Abstract:The article presents the torchosr package - a Python package compatible with PyTorch library - offering tools and methods dedicated to Open Set Recognition in Deep Neural Networks. The package offers two state-of-the-art methods in the field, a set of functions for handling base sets and generation of derived sets for the Open Set Recognition task (where some classes are considered unknown and used only in the testing process) and additional tools to handle datasets and methods. The main goal of the package proposal is to simplify and promote the correct experimental evaluation, where experiments are carried out on a large number of derivative sets with various Openness and class-to-category assignments. The authors hope that state-of-the-art methods available in the package will become a source of a correct and open-source implementation of the relevant solutions in the domain.
Abstract:The classification problem's complexity assessment is an essential element of many topics in the supervised learning domain. It plays a significant role in meta-learning -- becoming the basis for determining meta-attributes or multi-criteria optimization -- allowing the evaluation of the training set resampling without needing to rebuild the recognition model. The tools currently available for the academic community, which would enable the calculation of problem complexity measures, are available only as libraries of the C++ and R languages. This paper describes the software module that allows for the estimation of 22 complexity measures for the Python language -- compatible with the scikit-learn programming interface -- allowing for the implementation of research using them in the most popular programming environment of the machine learning community.