Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Daphna Weinshall

Active Learning with a Noisy Annotator

Apr 06, 2025

Netta Shafir, Guy Hacohen, Daphna Weinshall

Abstract:Active Learning (AL) aims to reduce annotation costs by strategically selecting the most informative samples for labeling. However, most active learning methods struggle in the low-budget regime where only a few labeled examples are available. This issue becomes even more pronounced when annotators provide noisy labels. A common AL approach for the low- and mid-budget regimes focuses on maximizing the coverage of the labeled set across the entire dataset. We propose a novel framework called Noise-Aware Active Sampling (NAS) that extends existing greedy, coverage-based active learning strategies to handle noisy annotations. NAS identifies regions that remain uncovered due to the selection of noisy representatives and enables resampling from these areas. We introduce a simple yet effective noise filtering approach suitable for the low-budget regime, which leverages the inner mechanism of NAS and can be applied for noise filtering before model training. On multiple computer vision benchmarks, including CIFAR100 and ImageNet subsets, NAS significantly improves performance for standard active learning methods across different noise types and rates.

Via

Access Paper or Ask Questions

On Local Overfitting and Forgetting in Deep Neural Networks

Dec 17, 2024

Uri Stern, Tomer Yaacoby, Daphna Weinshall

Abstract:The infrequent occurrence of overfitting in deep neural networks is perplexing: contrary to theoretical expectations, increasing model size often enhances performance in practice. But what if overfitting does occur, though restricted to specific sub-regions of the data space? In this work, we propose a novel score that captures the forgetting rate of deep models on validation data. We posit that this score quantifies local overfitting: a decline in performance confined to certain regions of the data space. We then show empirically that local overfitting occurs regardless of the presence of traditional overfitting. Using the framework of deep over-parametrized linear models, we offer a certain theoretical characterization of forgotten knowledge, and show that it correlates with knowledge forgotten by real deep models. Finally, we devise a new ensemble method that aims to recover forgotten knowledge, relying solely on the training history of a single network. When combined with self-distillation, this method enhances the performance of any trained model without adding inference costs. Extensive empirical evaluations demonstrate the efficacy of our method across multiple datasets, contemporary neural network architectures, and training protocols.

* to appear in AAAI-25

Via

Access Paper or Ask Questions

DCoM: Active Learning for All Learners

Jul 01, 2024

Inbal Mishal, Daphna Weinshall

Figure 1 for DCoM: Active Learning for All Learners

Figure 2 for DCoM: Active Learning for All Learners

Figure 3 for DCoM: Active Learning for All Learners

Figure 4 for DCoM: Active Learning for All Learners

Abstract:Deep Active Learning (AL) techniques can be effective in reducing annotation costs for training deep models. However, their effectiveness in low- and high-budget scenarios seems to require different strategies, and achieving optimal results across varying budget scenarios remains a challenge. In this study, we introduce Dynamic Coverage & Margin mix (DCoM), a novel active learning approach designed to bridge this gap. Unlike existing strategies, DCoM dynamically adjusts its strategy, considering the competence of the current model. Through theoretical analysis and empirical evaluations on diverse datasets, including challenging computer vision tasks, we demonstrate DCoM's ability to overcome the cold start problem and consistently improve results across different budgetary constraints. Thus DCoM achieves state-of-the-art performance in both low- and high-budget regimes.

Via

Access Paper or Ask Questions

TEAL: New Selection Strategy for Small Buffers in Experience Replay Class Incremental Learning

Jun 30, 2024

Shahar Shaul-Ariel, Daphna Weinshall

Figure 1 for TEAL: New Selection Strategy for Small Buffers in Experience Replay Class Incremental Learning

Figure 2 for TEAL: New Selection Strategy for Small Buffers in Experience Replay Class Incremental Learning

Figure 3 for TEAL: New Selection Strategy for Small Buffers in Experience Replay Class Incremental Learning

Figure 4 for TEAL: New Selection Strategy for Small Buffers in Experience Replay Class Incremental Learning

Abstract:Continual Learning is an unresolved challenge, whose relevance increases when considering modern applications. Unlike the human brain, trained deep neural networks suffer from a phenomenon called Catastrophic Forgetting, where they progressively lose previously acquired knowledge upon learning new tasks. To mitigate this problem, numerous methods have been developed, many relying on replaying past exemplars during new task training. However, as the memory allocated for replay decreases, the effectiveness of these approaches diminishes. On the other hand, maintaining a large memory for the purpose of replay is inefficient and often impractical. Here we introduce TEAL, a novel approach to populate the memory with exemplars, that can be integrated with various experience-replay methods and significantly enhance their performance on small memory buffers. We show that TEAL improves the average accuracy of the SOTA method XDER as well as ER and ER-ACE on several image recognition benchmarks, with a small memory buffer of 1-3 exemplars per class in the final task. This confirms the hypothesis that when memory is scarce, it is best to prioritize the most typical data.

Via

Access Paper or Ask Questions

United We Stand: Using Epoch-wise Agreement of Ensembles to Combat Overfit

Oct 17, 2023

Uri Stern, Daniel Shwartz, Daphna Weinshall

Figure 1 for United We Stand: Using Epoch-wise Agreement of Ensembles to Combat Overfit

Figure 2 for United We Stand: Using Epoch-wise Agreement of Ensembles to Combat Overfit

Figure 3 for United We Stand: Using Epoch-wise Agreement of Ensembles to Combat Overfit

Figure 4 for United We Stand: Using Epoch-wise Agreement of Ensembles to Combat Overfit

Abstract:Deep neural networks have become the method of choice for solving many image classification tasks, largely because they can fit very complex functions defined over raw images. The downside of such powerful learners is the danger of overfitting the training set, leading to poor generalization, which is usually avoided by regularization and "early stopping" of the training. In this paper, we propose a new deep network ensemble classifier that is very effective against overfit. We begin with the theoretical analysis of a regression model, whose predictions - that the variance among classifiers increases when overfit occurs - is demonstrated empirically in deep networks in common use. Guided by these results, we construct a new ensemble-based prediction method designed to combat overfit, where the prediction is determined by the most consensual prediction throughout the training. On multiple image and text classification datasets, we show that when regular ensembles suffer from overfit, our method eliminates the harmful reduction in generalization due to overfit, and often even surpasses the performance obtained by early stopping. Our method is easy to implement, and can be integrated with any training scheme and architecture, without additional prior knowledge beyond the training set. Accordingly, it is a practical and useful tool to overcome overfit.

Via

Access Paper or Ask Questions

Relearning Forgotten Knowledge: on Forgetting, Overfit and Training-Free Ensembles of DNNs

Oct 17, 2023

Uri Stern, Daphna Weinshall

Abstract:The infrequent occurrence of overfit in deep neural networks is perplexing. On the one hand, theory predicts that as models get larger they should eventually become too specialized for a specific training set, with ensuing decrease in generalization. In contrast, empirical results in image classification indicate that increasing the training time of deep models or using bigger models almost never hurts generalization. Is it because the way we measure overfit is too limited? Here, we introduce a novel score for quantifying overfit, which monitors the forgetting rate of deep models on validation data. Presumably, this score indicates that even while generalization improves overall, there are certain regions of the data space where it deteriorates. When thus measured, we show that overfit can occur with and without a decrease in validation accuracy, and may be more common than previously appreciated. This observation may help to clarify the aforementioned confusing picture. We use our observations to construct a new ensemble method, based solely on the training history of a single network, which provides significant improvement in performance without any additional cost in training time. An extensive empirical evaluation with modern deep models shows our method's utility on multiple datasets, neural networks architectures and training schemes, both when training from scratch and when using pre-trained networks in transfer learning. Notably, our method outperforms comparable methods while being easier to implement and use, and further improves the performance of competitive networks on Imagenet by 1\%.

Via

Access Paper or Ask Questions

Semi-Supervised Learning in the Few-Shot Zero-Shot Scenario

Aug 27, 2023

Noam Fluss, Guy Hacohen, Daphna Weinshall

Abstract:Semi-Supervised Learning (SSL) leverages both labeled and unlabeled data to improve model performance. Traditional SSL methods assume that labeled and unlabeled data share the same label space. However, in real-world applications, especially when the labeled training set is small, there may be classes that are missing from the labeled set. Existing frameworks aim to either reject all unseen classes (open-set SSL) or to discover unseen classes by partitioning an unlabeled set during training (open-world SSL). In our work, we construct a classifier for points from both seen and unseen classes. Our approach is based on extending an existing SSL method, such as FlexMatch, by incorporating an additional entropy loss. This enhancement allows our method to improve the performance of any existing SSL method in the classification of both seen and unseen classes. We demonstrate large improvement gains over state-of-the-art SSL, open-set SSL, and open-world SSL methods, on two benchmark image classification data sets, CIFAR-100 and STL-10. The gains are most pronounced when the labeled data is severely limited (1-25 labeled examples per class).

Via

Access Paper or Ask Questions

Pruning the Unlabeled Data to Improve Semi-Supervised Learning

Aug 27, 2023

Guy Hacohen, Daphna Weinshall

Figure 1 for Pruning the Unlabeled Data to Improve Semi-Supervised Learning

Figure 2 for Pruning the Unlabeled Data to Improve Semi-Supervised Learning

Figure 3 for Pruning the Unlabeled Data to Improve Semi-Supervised Learning

Figure 4 for Pruning the Unlabeled Data to Improve Semi-Supervised Learning

Abstract:In the domain of semi-supervised learning (SSL), the conventional approach involves training a learner with a limited amount of labeled data alongside a substantial volume of unlabeled data, both drawn from the same underlying distribution. However, for deep learning models, this standard practice may not yield optimal results. In this research, we propose an alternative perspective, suggesting that distributions that are more readily separable could offer superior benefits to the learner as compared to the original distribution. To achieve this, we present PruneSSL, a practical technique for selectively removing examples from the original unlabeled dataset to enhance its separability. We present an empirical study, showing that although PruneSSL reduces the quantity of available training data for the learner, it significantly improves the performance of various competitive SSL algorithms, thereby achieving state-of-the-art results across several image classification tasks.

Via

Access Paper or Ask Questions

How to Select Which Active Learning Strategy is Best Suited for Your Specific Problem and Budget

Jun 06, 2023

Guy Hacohen, Daphna Weinshall

Abstract:In Active Learning (AL), a learner actively chooses which unlabeled examples to query for labels from an oracle, under some budget constraints. Different AL query strategies are more suited to different problems and budgets. Therefore, in practice, knowing in advance which AL strategy is most suited for the problem at hand remains an open problem. To tackle this challenge, we propose a practical derivative-based method that dynamically identifies the best strategy for each budget. We provide theoretical analysis of a simplified case to motivate our approach and build intuition. We then introduce a method to dynamically select an AL strategy based on the specific problem and budget. Empirical results showcase the effectiveness of our approach across diverse budgets and computer vision tasks.

Via

Access Paper or Ask Questions

The Dynamic of Consensus in Deep Networks and the Identification of Noisy Labels

Oct 02, 2022

Daniel Shwartz, Uri Stern, Daphna Weinshall

Figure 1 for The Dynamic of Consensus in Deep Networks and the Identification of Noisy Labels

Figure 2 for The Dynamic of Consensus in Deep Networks and the Identification of Noisy Labels

Figure 3 for The Dynamic of Consensus in Deep Networks and the Identification of Noisy Labels

Figure 4 for The Dynamic of Consensus in Deep Networks and the Identification of Noisy Labels

Abstract:Deep neural networks have incredible capacity and expressibility, and can seemingly memorize any training set. This introduces a problem when training in the presence of noisy labels, as the noisy examples cannot be distinguished from clean examples by the end of training. Recent research has dealt with this challenge by utilizing the fact that deep networks seem to memorize clean examples much earlier than noisy examples. Here we report a new empirical result: for each example, when looking at the time it has been memorized by each model in an ensemble of networks, the diversity seen in noisy examples is much larger than the clean examples. We use this observation to develop a new method for noisy labels filtration. The method is based on a statistics of the data, which captures the differences in ensemble learning dynamics between clean and noisy data. We test our method on three tasks: (i) noise amount estimation; (ii) noise filtration; (iii) supervised classification. We show that our method improves over existing baselines in all three tasks using a variety of datasets, noise models, and noise levels. Aside from its improved performance, our method has two other advantages. (i) Simplicity, which implies that no additional hyperparameters are introduced. (ii) Our method is modular: it does not work in an end-to-end fashion, and can therefore be used to clean a dataset for any other future usage.

Via

Access Paper or Ask Questions