Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ruan van der Merwe

Mitigating Catastrophic Forgetting for Few-Shot Spoken Word Classification Through Meta-Learning

May 22, 2023

Ruan van der Merwe, Herman Kamper

Abstract:We consider the problem of few-shot spoken word classification in a setting where a model is incrementally introduced to new word classes. This would occur in a user-defined keyword system where new words can be added as the system is used. In such a continual learning scenario, a model might start to misclassify earlier words as newer classes are added, i.e. catastrophic forgetting. To address this, we propose an extension to model-agnostic meta-learning (MAML): each inner learning loop, where a model "learns how to learn'' new classes, ends with a single gradient update using stored templates from all the classes that the model has already seen (one template per class). We compare this method to OML (another extension of MAML) in few-shot isolated-word classification experiments on Google Commands and FACC. Our method consistently outperforms OML in experiments where the number of shots and the final number of classes are varied.

* 5 pages, 3 figures, Accepted to Interspeech 2023

Via

Access Paper or Ask Questions

Manifold Characteristics That Predict Downstream Task Performance

May 16, 2022

Ruan van der Merwe, Gregory Newman, Etienne Barnard

Figure 1 for Manifold Characteristics That Predict Downstream Task Performance

Figure 2 for Manifold Characteristics That Predict Downstream Task Performance

Figure 3 for Manifold Characteristics That Predict Downstream Task Performance

Figure 4 for Manifold Characteristics That Predict Downstream Task Performance

Abstract:Pretraining methods are typically compared by evaluating the accuracy of linear classifiers, transfer learning performance, or visually inspecting the representation manifold's (RM) lower-dimensional projections. We show that the differences between methods can be understood more clearly by investigating the RM directly, which allows for a more detailed comparison. To this end, we propose a framework and new metric to measure and compare different RMs. We also investigate and report on the RM characteristics for various pretraining methods. These characteristics are measured by applying sequentially larger local alterations to the input data, using white noise injections and Projected Gradient Descent (PGD) adversarial attacks, and then tracking each datapoint. We calculate the total distance moved for each datapoint and the relative change in distance between successive alterations. We show that self-supervised methods learn an RM where alterations lead to large but constant size changes, indicating a smoother RM than fully supervised methods. We then combine these measurements into one metric, the Representation Manifold Quality Metric (RMQM), where larger values indicate larger and less variable step sizes, and show that RMQM correlates positively with performance on downstream tasks.

* Currently under review

Via

Access Paper or Ask Questions

Triplet Entropy Loss: Improving The Generalisation of Short Speech Language Identification Systems

Dec 03, 2020

Ruan van der Merwe

Figure 1 for Triplet Entropy Loss: Improving The Generalisation of Short Speech Language Identification Systems

Figure 2 for Triplet Entropy Loss: Improving The Generalisation of Short Speech Language Identification Systems

Figure 3 for Triplet Entropy Loss: Improving The Generalisation of Short Speech Language Identification Systems

Figure 4 for Triplet Entropy Loss: Improving The Generalisation of Short Speech Language Identification Systems

Abstract:We present several methods to improve the generalisation of language identification (LID) systems to new speakers and to new domains. These methods involve Spectral augmentation, where spectrograms are masked in the frequency or time bands during training and CNN architectures that are pre-trained on the Imagenet dataset. The paper also introduces the novel Triplet Entropy Loss training method, which involves training a network simultaneously using Cross Entropy and Triplet loss. It was found that all three methods improved the generalisation of the models, though not significantly. Even though the models trained using Triplet Entropy Loss showed a better understanding of the languages and higher accuracies, it appears as though the models still memorise word patterns present in the spectrograms rather than learning the finer nuances of a language. The research shows that Triplet Entropy Loss has great potential and should be investigated further, not only in language identification tasks but any classification task.

* 22 pages, 26 figures, Code available at https://github.com/ruanvdmerwe/triplet-entropy-loss

Via

Access Paper or Ask Questions