Abstract:This work continues to investigate the link between differentially private (DP) and online learning. Alon, Livni, Malliaris, and Moran (2019) showed that for binary concept classes, DP learnability of a given class implies that it has a finite Littlestone dimension (equivalently, that it is online learnable). Their proof relies on a model-theoretic result by Hodges (1997), which demonstrates that any binary concept class with a large Littlestone dimension contains a large subclass of thresholds. In a follow-up work, Jung, Kim, and Tewari (2020) extended this proof to multiclass PAC learning with a bounded number of labels. Unfortunately, Hodges's result does not apply in other natural settings such as multiclass PAC learning with an unbounded label space, and PAC learning of partial concept classes. This naturally raises the question of whether DP learnability continues to imply online learnability in more general scenarios: indeed, Alon, Hanneke, Holzman, and Moran (2021) explicitly leave it as an open question in the context of partial concept classes, and the same question is open in the general multiclass setting. In this work, we give a positive answer to these questions showing that for general classification tasks, DP learnability implies online learnability. Our proof reasons directly about Littlestone trees, without relying on thresholds. We achieve this by establishing several Ramsey-type theorems for trees, which might be of independent interest.
Abstract:We study multiclass online prediction where the learner can predict using a list of multiple labels (as opposed to just one label in the traditional setting). We characterize learnability in this model using the $b$-ary Littlestone dimension. This dimension is a variation of the classical Littlestone dimension with the difference that binary mistake trees are replaced with $(k+1)$-ary mistake trees, where $k$ is the number of labels in the list. In the agnostic setting, we explore different scenarios depending on whether the comparator class consists of single-labeled or multi-labeled functions and its tradeoff with the size of the lists the algorithm uses. We find that it is possible to achieve negative regret in some cases and provide a complete characterization of when this is possible. As part of our work, we adapt classical algorithms such as Littlestone's SOA and Rosenblatt's Perceptron to predict using lists of labels. We also establish combinatorial results for list-learnable classes, including an list online version of the Sauer-Shelah-Perles Lemma. We state our results within the framework of pattern classes -- a generalization of hypothesis classes which can represent adaptive hypotheses (i.e. functions with memory), and model data-dependent assumptions such as linear classification with margin.