Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:CheckSel: Efficient and Accurate Data-valuation Through Online Checkpoint Selection

Mar 14, 2022

Soumi Das, Manasvi Sagarkar, Suparna Bhattacharya, Sourangshu Bhattacharya

Figure 1 for CheckSel: Efficient and Accurate Data-valuation Through Online Checkpoint Selection

Figure 2 for CheckSel: Efficient and Accurate Data-valuation Through Online Checkpoint Selection

Figure 3 for CheckSel: Efficient and Accurate Data-valuation Through Online Checkpoint Selection

Figure 4 for CheckSel: Efficient and Accurate Data-valuation Through Online Checkpoint Selection

Share this with someone who'll enjoy it:

Abstract:Data valuation and subset selection have emerged as valuable tools for application-specific selection of important training data. However, the efficiency-accuracy tradeoffs of state-of-the-art methods hinder their widespread application to many AI workflows. In this paper, we propose a novel 2-phase solution to this problem. Phase 1 selects representative checkpoints from an SGD-like training algorithm, which are used in phase-2 to estimate the approximate training data values, e.g. decrease in validation loss due to each training point. A key contribution of this paper is CheckSel, an Orthogonal Matching Pursuit-inspired online sparse approximation algorithm for checkpoint selection in the online setting, where the features are revealed one at a time. Another key contribution is the study of data valuation in the domain adaptation setting, where a data value estimator obtained using checkpoints from training trajectory in the source domain training dataset is used for data valuation in a target domain training dataset. Experimental results on benchmark datasets show the proposed algorithm outperforms recent baseline methods by up to 30% in terms of test accuracy while incurring a similar computational burden, for both standalone and domain adaptation settings.

View paper on

Share this with someone who'll enjoy it:

Title:CheckSel: Efficient and Accurate Data-valuation Through Online Checkpoint Selection

Paper and Code