Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Lisha Chen

Bilevel Joint Unsupervised and Supervised Training for Automatic Speech Recognition

Dec 11, 2024

Xiaodong Cui, A F M Saif, Songtao Lu, Lisha Chen, Tianyi Chen, Brian Kingsbury, George Saon

Abstract:In this paper, we propose a bilevel joint unsupervised and supervised training (BL-JUST) framework for automatic speech recognition. Compared to the conventional pre-training and fine-tuning strategy which is a disconnected two-stage process, BL-JUST tries to optimize an acoustic model such that it simultaneously minimizes both the unsupervised and supervised loss functions. Because BL-JUST seeks matched local optima of both loss functions, acoustic representations learned by the acoustic model strike a good balance between being generic and task-specific. We solve the BL-JUST problem using penalty-based bilevel gradient descent and evaluate the trained deep neural network acoustic models on various datasets with a variety of architectures and loss functions. We show that BL-JUST can outperform the widely-used pre-training and fine-tuning strategy and some other popular semi-supervised techniques.

* Accepted by IEEE/ACM Transactions on Audio, Speech and Language Processing

Via

Access Paper or Ask Questions

FERERO: A Flexible Framework for Preference-Guided Multi-Objective Learning

Dec 02, 2024

Lisha Chen, AFM Saif, Yanning Shen, Tianyi Chen

Abstract:Finding specific preference-guided Pareto solutions that represent different trade-offs among multiple objectives is critical yet challenging in multi-objective problems. Existing methods are restrictive in preference definitions and/or their theoretical guarantees. In this work, we introduce a Flexible framEwork for pREfeRence-guided multi-Objective learning (FERERO) by casting it as a constrained vector optimization problem. Specifically, two types of preferences are incorporated into this formulation -- the relative preference defined by the partial ordering induced by a polyhedral cone, and the absolute preference defined by constraints that are linear functions of the objectives. To solve this problem, convergent algorithms are developed with both single-loop and stochastic variants. Notably, this is the first single-loop primal algorithm for constrained vector optimization to our knowledge. The proposed algorithms adaptively adjust to both constraint and objective values, eliminating the need to solve different subproblems at different stages of constraint satisfaction. Experiments on multiple benchmarks demonstrate the proposed method is very competitive in finding preference-guided optimal solutions. Code is available at https://github.com/lisha-chen/FERERO/.

Via

Access Paper or Ask Questions

Three-Way Trade-Off in Multi-Objective Learning: Optimization, Generalization and Conflict-Avoidance

May 31, 2023

Lisha Chen, Heshan Fernando, Yiming Ying, Tianyi Chen

Figure 1 for Three-Way Trade-Off in Multi-Objective Learning: Optimization, Generalization and Conflict-Avoidance

Figure 2 for Three-Way Trade-Off in Multi-Objective Learning: Optimization, Generalization and Conflict-Avoidance

Figure 3 for Three-Way Trade-Off in Multi-Objective Learning: Optimization, Generalization and Conflict-Avoidance

Figure 4 for Three-Way Trade-Off in Multi-Objective Learning: Optimization, Generalization and Conflict-Avoidance

Abstract:Multi-objective learning (MOL) problems often arise in emerging machine learning problems when there are multiple learning criteria or multiple learning tasks. Recent works have developed various dynamic weighting algorithms for MOL such as MGDA and its variants, where the central idea is to find an update direction that avoids conflicts among objectives. Albeit its appealing intuition, empirical studies show that dynamic weighting methods may not always outperform static ones. To understand this theory-practical gap, we focus on a new stochastic variant of MGDA - the Multi-objective gradient with Double sampling (MoDo) algorithm, and study the generalization performance of the dynamic weighting-based MoDo and its interplay with optimization through the lens of algorithm stability. Perhaps surprisingly, we find that the key rationale behind MGDA -- updating along conflict-avoidant direction - may hinder dynamic weighting algorithms from achieving the optimal ${\cal O}(1/\sqrt{n})$ population risk, where $n$ is the number of training samples. We further demonstrate the variability of dynamic weights on the three-way trade-off among optimization, generalization, and conflict avoidance that is unique in MOL.

Via

Access Paper or Ask Questions

Learning with Limited Samples -- Meta-Learning and Applications to Communication Systems

Oct 03, 2022

Lisha Chen, Sharu Theresa Jose, Ivana Nikoloska, Sangwoo Park, Tianyi Chen, Osvaldo Simeone

Figure 1 for Learning with Limited Samples -- Meta-Learning and Applications to Communication Systems

Figure 2 for Learning with Limited Samples -- Meta-Learning and Applications to Communication Systems

Figure 3 for Learning with Limited Samples -- Meta-Learning and Applications to Communication Systems

Figure 4 for Learning with Limited Samples -- Meta-Learning and Applications to Communication Systems

Abstract:Deep learning has achieved remarkable success in many machine learning tasks such as image classification, speech recognition, and game playing. However, these breakthroughs are often difficult to translate into real-world engineering systems because deep learning models require a massive number of training samples, which are costly to obtain in practice. To address labeled data scarcity, few-shot meta-learning optimizes learning algorithms that can efficiently adapt to new tasks quickly. While meta-learning is gaining significant interest in the machine learning literature, its working principles and theoretic fundamentals are not as well understood in the engineering community. This review monograph provides an introduction to meta-learning by covering principles, algorithms, theory, and engineering applications. After introducing meta-learning in comparison with conventional and joint learning, we describe the main meta-learning algorithms, as well as a general bilevel optimization framework for the definition of meta-learning techniques. Then, we summarize known results on the generalization capabilities of meta-learning from a statistical learning viewpoint. Applications to communication systems, including decoding and power allocation, are discussed next, followed by an introduction to aspects related to the integration of meta-learning with emerging computing technologies, namely neuromorphic and quantum computing. The monograph is concluded with an overview of open research challenges.

* This is the first draft of this monograph submitted for review. Comments are very welcome

Via

Access Paper or Ask Questions

Understanding Benign Overfitting in Nested Meta Learning

Jun 27, 2022

Lisha Chen, Songtao Lu, Tianyi Chen

Figure 1 for Understanding Benign Overfitting in Nested Meta Learning

Figure 2 for Understanding Benign Overfitting in Nested Meta Learning

Figure 3 for Understanding Benign Overfitting in Nested Meta Learning

Figure 4 for Understanding Benign Overfitting in Nested Meta Learning

Abstract:Meta learning has demonstrated tremendous success in few-shot learning with limited supervised data. In those settings, the meta model is usually overparameterized. While the conventional statistical learning theory suggests that overparameterized models tend to overfit, empirical evidence reveals that overparameterized meta learning methods still work well -- a phenomenon often called ``benign overfitting.'' To understand this phenomenon, we focus on the meta learning settings with a challenging nested structure that we term the nested meta learning, and analyze its generalization performance under an overparameterized meta learning model. While our analysis uses the relatively tractable linear models, our theory contributes to understanding the delicate interplay among data heterogeneity, model adaptation and benign overfitting in nested meta learning tasks. We corroborate our theoretical claims through numerical simulations.

* Conference submission in May 2022

Via

Access Paper or Ask Questions

Sharp-MAML: Sharpness-Aware Model-Agnostic Meta Learning

Jun 10, 2022

Momin Abbas, Quan Xiao, Lisha Chen, Pin-Yu Chen, Tianyi Chen

Figure 1 for Sharp-MAML: Sharpness-Aware Model-Agnostic Meta Learning

Figure 2 for Sharp-MAML: Sharpness-Aware Model-Agnostic Meta Learning

Figure 3 for Sharp-MAML: Sharpness-Aware Model-Agnostic Meta Learning

Figure 4 for Sharp-MAML: Sharpness-Aware Model-Agnostic Meta Learning

Abstract:Model-agnostic meta learning (MAML) is currently one of the dominating approaches for few-shot meta-learning. Albeit its effectiveness, the optimization of MAML can be challenging due to the innate bilevel problem structure. Specifically, the loss landscape of MAML is much more complex with possibly more saddle points and local minimizers than its empirical risk minimization counterpart. To address this challenge, we leverage the recently invented sharpness-aware minimization and develop a sharpness-aware MAML approach that we term Sharp-MAML. We empirically demonstrate that Sharp-MAML and its computation-efficient variant can outperform popular existing MAML baselines (e.g., $+12\%$ accuracy on Mini-Imagenet). We complement the empirical study with the convergence rate analysis and the generalization bound of Sharp-MAML. To the best of our knowledge, this is the first empirical and theoretical study on sharpness-aware minimization in the context of bilevel learning. The code is available at https://github.com/mominabbass/Sharp-MAML.

* accepted to ICML 2022

Via

Access Paper or Ask Questions

Is Bayesian Model-Agnostic Meta Learning Better than Model-Agnostic Meta Learning, Provably?

Mar 06, 2022

Lisha Chen, Tianyi Chen

Figure 1 for Is Bayesian Model-Agnostic Meta Learning Better than Model-Agnostic Meta Learning, Provably?

Figure 2 for Is Bayesian Model-Agnostic Meta Learning Better than Model-Agnostic Meta Learning, Provably?

Figure 3 for Is Bayesian Model-Agnostic Meta Learning Better than Model-Agnostic Meta Learning, Provably?

Figure 4 for Is Bayesian Model-Agnostic Meta Learning Better than Model-Agnostic Meta Learning, Provably?

Abstract:Meta learning aims at learning a model that can quickly adapt to unseen tasks. Widely used meta learning methods include model agnostic meta learning (MAML), implicit MAML, Bayesian MAML. Thanks to its ability of modeling uncertainty, Bayesian MAML often has advantageous empirical performance. However, the theoretical understanding of Bayesian MAML is still limited, especially on questions such as if and when Bayesian MAML has provably better performance than MAML. In this paper, we aim to provide theoretical justifications for Bayesian MAML's advantageous performance by comparing the meta test risks of MAML and Bayesian MAML. In the meta linear regression, under both the distribution agnostic and linear centroid cases, we have established that Bayesian MAML indeed has provably lower meta test risks than MAML. We verify our theoretical results through experiments.

Via

Access Paper or Ask Questions

Deep Structured Prediction for Facial Landmark Detection

Oct 18, 2020

Lisha Chen, Hui Su, Qiang Ji

Figure 1 for Deep Structured Prediction for Facial Landmark Detection

Figure 2 for Deep Structured Prediction for Facial Landmark Detection

Figure 3 for Deep Structured Prediction for Facial Landmark Detection

Figure 4 for Deep Structured Prediction for Facial Landmark Detection

Abstract:Existing deep learning based facial landmark detection methods have achieved excellent performance. These methods, however, do not explicitly embed the structural dependencies among landmark points. They hence cannot preserve the geometric relationships between landmark points or generalize well to challenging conditions or unseen data. This paper proposes a method for deep structured facial landmark detection based on combining a deep Convolutional Network with a Conditional Random Field. We demonstrate its superior performance to existing state-of-the-art techniques in facial landmark detection, especially a better generalization ability on challenging datasets that include large pose and occlusion.

* Accepted by NeurIPS 2019

Via

Access Paper or Ask Questions