Abstract:We introduce a relevant yet challenging problem named Personalized Dictionary Learning (PerDL), where the goal is to learn sparse linear representations from heterogeneous datasets that share some commonality. In PerDL, we model each dataset's shared and unique features as global and local dictionaries. Challenges for PerDL not only are inherited from classical dictionary learning (DL), but also arise due to the unknown nature of the shared and unique features. In this paper, we rigorously formulate this problem and provide conditions under which the global and local dictionaries can be provably disentangled. Under these conditions, we provide a meta-algorithm called Personalized Matching and Averaging (PerMA) that can recover both global and local dictionaries from heterogeneous datasets. PerMA is highly efficient; it converges to the ground truth at a linear rate under suitable conditions. Moreover, it automatically borrows strength from strong learners to improve the prediction of weak learners. As a general framework for extracting global and local dictionaries, we show the application of PerDL in different learning tasks, such as training with imbalanced datasets and video surveillance.
Abstract:This paper focuses on complete dictionary learning problem, where the goal is to reparametrize a set of given signals as linear combinations of atoms from a learned dictionary. There are two main challenges faced by theoretical and practical studies of dictionary learning: the lack of theoretical guarantees for practically-used heuristic algorithms, and their poor scalability when dealing with huge-scale datasets. Towards addressing these issues, we show that when the dictionary to be learned is orthogonal, that an alternating minimization method directly applied to the nonconvex and discrete formulation of the problem exactly recovers the ground truth. For the huge-scale, potentially online setting, we propose a minibatch version of our algorithm, which can provably learn a complete dictionary from a huge-scale dataset with minimal sample complexity, linear sparsity level, and linear convergence rate, thereby negating the need for any convex relaxation for the problem. Our numerical experiments showcase the superiority of our method compared with the existing techniques when applied to tasks on real data.