Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Temporally-Biased Sampling Schemes for Online Model Management

Jun 11, 2019

Brian Hentschel, Peter J. Haas, Yuanyuan Tian

Figure 1 for Temporally-Biased Sampling Schemes for Online Model Management

Figure 2 for Temporally-Biased Sampling Schemes for Online Model Management

Figure 3 for Temporally-Biased Sampling Schemes for Online Model Management

Figure 4 for Temporally-Biased Sampling Schemes for Online Model Management

Share this with someone who'll enjoy it:

Abstract:To maintain the accuracy of supervised learning models in the presence of evolving data streams, we provide temporally-biased sampling schemes that weight recent data most heavily, with inclusion probabilities for a given data item decaying over time according to a specified "decay function". We then periodically retrain the models on the current sample. This approach speeds up the training process relative to training on all of the data. Moreover, time-biasing lets the models adapt to recent changes in the data while---unlike in a sliding-window approach---still keeping some old data to ensure robustness in the face of temporary fluctuations and periodicities in the data values. In addition, the sampling-based approach allows existing analytic algorithms for static data to be applied to dynamic streaming data essentially without change. We provide and analyze both a simple sampling scheme (T-TBS) that probabilistically maintains a target sample size and a novel reservoir-based scheme (R-TBS) that is the first to provide both control over the decay rate and a guaranteed upper bound on the sample size. If the decay function is exponential, then control over the decay rate is complete, and R-TBS maximizes both expected sample size and sample-size stability. For general decay functions, the actual item inclusion probabilities can be made arbitrarily close to the nominal probabilities, and we provide a scheme that allows a trade-off between sample footprint and sample-size stability. The R-TBS and T-TBS schemes are of independent interest, extending the known set of unequal-probability sampling schemes. We discuss distributed implementation strategies; experiments in Spark illuminate the performance and scalability of the algorithms, and show that our approach can increase machine learning robustness in the face of evolving data.

* 49 pages, 18 figures. arXiv admin note: substantial text overlap with arXiv:1801.09709

View paper on

Share this with someone who'll enjoy it:

Title:Temporally-Biased Sampling Schemes for Online Model Management

Paper and Code