Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sonia Gupta

A Scalable Approach to Covariate and Concept Drift Management via Adaptive Data Segmentation

Nov 23, 2024

Vennela Yarabolu, Govind Waghmare, Sonia Gupta, Siddhartha Asthana

Figure 1 for A Scalable Approach to Covariate and Concept Drift Management via Adaptive Data Segmentation

Figure 2 for A Scalable Approach to Covariate and Concept Drift Management via Adaptive Data Segmentation

Figure 3 for A Scalable Approach to Covariate and Concept Drift Management via Adaptive Data Segmentation

Figure 4 for A Scalable Approach to Covariate and Concept Drift Management via Adaptive Data Segmentation

Abstract:In many real-world applications, continuous machine learning (ML) systems are crucial but prone to data drift, a phenomenon where discrepancies between historical training data and future test data lead to significant performance degradation and operational inefficiencies. Traditional drift adaptation methods typically update models using ensemble techniques, often discarding drifted historical data, and focus primarily on either covariate drift or concept drift. These methods face issues such as high resource demands, inability to manage all types of drifts effectively, and neglecting the valuable context that historical data can provide. We contend that explicitly incorporating drifted data into the model training process significantly enhances model accuracy and robustness. This paper introduces an advanced framework that integrates the strengths of data-centric approaches with adaptive management of both covariate and concept drift in a scalable and efficient manner. Our framework employs sophisticated data segmentation techniques to identify optimal data batches that accurately reflect test data patterns. These data batches are then utilized for training on test data, ensuring that the models remain relevant and accurate over time. By leveraging the advantages of both data segmentation and scalable drift management, our solution ensures robust model accuracy and operational efficiency in large-scale ML deployments. It also minimizes resource consumption and computational overhead by selecting and utilizing relevant data subsets, leading to significant cost savings. Experimental results on classification task on real-world and synthetic datasets show our approach improves model accuracy while reducing operational costs and latency. This practical solution overcomes inefficiencies in current methods, providing a robust, adaptable, and scalable approach.

* Accepted in CODS-COMAD 2024

Via

Access Paper or Ask Questions