Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Eduardo V. L. Barboza

CaDrift: A Time-dependent Causal Generator of Drifting Data Streams

Feb 23, 2026

Eduardo V. L. Barboza, Jean Paul Barddal, Robert Sabourin, Rafael M. O. Cruz

Abstract:This work presents Causal Drift Generator (CaDrift), a time-dependent synthetic data generator framework based on Structural Causal Models (SCMs). The framework produces a virtually infinite combination of data streams with controlled shift events and time-dependent data, making it a tool to evaluate methods under evolving data. CaDrift synthesizes various distributional and covariate shifts by drifting mapping functions of the SCM, which change underlying cause-and-effect relationships between features and the target. In addition, CaDrift models occasional perturbations by leveraging interventions in causal modeling. Experimental results show that, after distributional shift events, the accuracy of classifiers tends to drop, followed by a gradual retrieval, confirming the generator's effectiveness in simulating shifts. The framework has been made available on GitHub.

* Paper submitted to ICLR 2026

Via

Access Paper or Ask Questions

Distance Functions and Normalization Under Stream Scenarios

Jul 04, 2023

Eduardo V. L. Barboza, Paulo R. Lisboa de Almeida, Alceu de Souza Britto Jr, Rafael M. O. Cruz

Figure 1 for Distance Functions and Normalization Under Stream Scenarios

Figure 2 for Distance Functions and Normalization Under Stream Scenarios

Figure 3 for Distance Functions and Normalization Under Stream Scenarios

Figure 4 for Distance Functions and Normalization Under Stream Scenarios

Abstract:Data normalization is an essential task when modeling a classification system. When dealing with data streams, data normalization becomes especially challenging since we may not know in advance the properties of the features, such as their minimum/maximum values, and these properties may change over time. We compare the accuracies generated by eight well-known distance functions in data streams without normalization, normalized considering the statistics of the first batch of data received, and considering the previous batch received. We argue that experimental protocols for streams that consider the full stream as normalized are unrealistic and can lead to biased and poor results. Our results indicate that using the original data stream without applying normalization, and the Canberra distance, can be a good combination when no information about the data stream is known beforehand.

* Paper accepted to the 2023 International Joint Conference on Neural Networks

Via

Access Paper or Ask Questions