Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Paweł Ksieniewicz

Structuring the Processing Frameworks for Data Stream Evaluation and Application

Nov 11, 2024

Joanna Komorniczak, Paweł Ksieniewicz, Paweł Zyblewski

Abstract:The following work addresses the problem of frameworks for data stream processing that can be used to evaluate the solutions in an environment that resembles real-world applications. The definition of structured frameworks stems from a need to reliably evaluate the data stream classification methods, considering the constraints of delayed and limited label access. The current experimental evaluation often boundlessly exploits the assumption of their complete and immediate access to monitor the recognition quality and to adapt the methods to the changing concepts. The problem is leveraged by reviewing currently described methods and techniques for data stream processing and verifying their outcomes in simulated environment. The effect of the work is a proposed taxonomy of data stream processing frameworks, showing the linkage between drift detection and classification methods considering a natural phenomenon of label delay.

Via

Access Paper or Ask Questions

Employing Sentence Space Embedding for Classification of Data Stream from Fake News Domain

Jul 15, 2024

Paweł Zyblewski, Jakub Klikowski, Weronika Borek-Marciniec, Paweł Ksieniewicz

Figure 1 for Employing Sentence Space Embedding for Classification of Data Stream from Fake News Domain

Figure 2 for Employing Sentence Space Embedding for Classification of Data Stream from Fake News Domain

Figure 3 for Employing Sentence Space Embedding for Classification of Data Stream from Fake News Domain

Figure 4 for Employing Sentence Space Embedding for Classification of Data Stream from Fake News Domain

Abstract:Tabular data is considered the last unconquered castle of deep learning, yet the task of data stream classification is stated to be an equally important and demanding research area. Due to the temporal constraints, it is assumed that deep learning methods are not the optimal solution for application in this field. However, excluding the entire -- and prevalent -- group of methods seems rather rash given the progress that has been made in recent years in its development. For this reason, the following paper is the first to present an approach to natural language data stream classification using the sentence space method, which allows for encoding text into the form of a discrete digital signal. This allows the use of convolutional deep networks dedicated to image classification to solve the task of recognizing fake news based on text data. Based on the real-life Fakeddit dataset, the proposed approach was compared with state-of-the-art algorithms for data stream classification based on generalization ability and time complexity.

* 8 pages, 8 figures

Via

Access Paper or Ask Questions

Unsupervised Concept Drift Detection based on Parallel Activations of Neural Network

Apr 11, 2024

Joanna Komorniczak, Paweł Ksieniewicz

Abstract:Practical applications of artificial intelligence increasingly often have to deal with the streaming properties of real data, which, considering the time factor, are subject to phenomena such as periodicity and more or less chaotic degeneration - resulting directly in the concept drifts. The modern concept drift detectors almost always assume immediate access to labels, which due to their cost, limited availability and possible delay has been shown to be unrealistic. This work proposes an unsupervised Parallel Activations Drift Detector, utilizing the outputs of an untrained neural network, presenting its key design elements, intuitions about processing properties, and a pool of computer experiments demonstrating its competitiveness with state-of-the-art methods.

Via

Access Paper or Ask Questions

Active Weighted Aging Ensemble for Drifted Data Stream Classification

Dec 19, 2021

Michał Woźniak, Paweł Zyblewski, Paweł Ksieniewicz

Figure 1 for Active Weighted Aging Ensemble for Drifted Data Stream Classification

Figure 2 for Active Weighted Aging Ensemble for Drifted Data Stream Classification

Figure 3 for Active Weighted Aging Ensemble for Drifted Data Stream Classification

Figure 4 for Active Weighted Aging Ensemble for Drifted Data Stream Classification

Abstract:One of the significant problems of streaming data classification is the occurrence of concept drift, consisting of the change of probabilistic characteristics of the classification task. This phenomenon destabilizes the performance of the classification model and seriously degrades its quality. An appropriate strategy counteracting this phenomenon is required to adapt the classifier to the changing probabilistic characteristics. One of the significant problems in implementing such a solution is the access to data labels. It is usually costly, so to minimize the expenses related to this process, learning strategies based on semi-supervised learning are proposed, e.g., employing active learning methods indicating which of the incoming objects are valuable to be labeled for improving the classifier's performance. This paper proposes a novel chunk-based method for non-stationary data streams based on classifier ensemble learning and an active learning strategy considering a limited budget that can be successfully applied to any data stream classification algorithm. The proposed method has been evaluated through computer experiments using both real and generated data streams. The results confirm the high quality of the proposed algorithm over state-of-the-art methods.

* 29 pages, 3 figures

Via

Access Paper or Ask Questions

stream-learn -- open-source Python library for difficult data stream batch analysis

Jan 29, 2020

Paweł Ksieniewicz, Paweł Zyblewski

Figure 1 for stream-learn -- open-source Python library for difficult data stream batch analysis

Figure 2 for stream-learn -- open-source Python library for difficult data stream batch analysis

Figure 3 for stream-learn -- open-source Python library for difficult data stream batch analysis

Figure 4 for stream-learn -- open-source Python library for difficult data stream batch analysis

Abstract:stream-learn is a Python package compatible with scikit-learn and developed for the drifting and imbalanced data stream analysis. Its main component is a stream generator, which allows to produce a synthetic data stream that may incorporate each of the three main concept drift types (i.e. sudden, gradual and incremental drift) in their recurring or non-recurring versions. The package allows conducting experiments following established evaluation methodologies (i.e. Test-Then-Train and Prequential). In addition, estimators adapted for data stream classification have been implemented, including both simple classifiers and state-of-art chunk-based and online classifier ensembles. To improve computational efficiency, package utilises its own implementations of prediction metrics for imbalanced binary classification tasks.

Via

Access Paper or Ask Questions