Abstract:Universal Approximation Theorems establish the density of various classes of neural network function approximators in $C(K, \mathbb{R}^m)$, where $K \subset \mathbb{R}^n$ is compact. In this paper, we aim to extend these guarantees by establishing conditions on learning tasks that guarantee their continuity. We consider learning tasks given by conditional expectations $x \mapsto \mathrm{E}\left[Y \mid X = x\right]$, where the learning target $Y = f \circ L$ is a potentially pathological transformation of some underlying data-generating process $L$. Under a factorization $L = T \circ W$ for the data-generating process where $T$ is thought of as a deterministic map acting on some random input $W$, we establish conditions (that might be easily verified using knowledge of $T$ alone) that guarantee the continuity of practically \textit{any} derived learning task $x \mapsto \mathrm{E}\left[f \circ L \mid X = x\right]$. We motivate the realism of our conditions using the example of randomized stable matching, thus providing a theoretical justification for the continuity of real-world learning tasks.
Abstract:Recommendation systems are a key modern application of machine learning, but they have the downside that they often draw upon sensitive user information in making their predictions. We show how to address this deficiency by basing a service's recommendation engine upon recommendations from other existing services, which contain no sensitive information by nature. Specifically, we introduce a contextual multi-armed bandit recommendation framework where the agent has access to recommendations for other services. In our setting, the user's (potentially sensitive) information belongs to a high-dimensional latent space, and the ideal recommendations for the source and target tasks (which are non-sensitive) are given by unknown linear transformations of the user information. So long as the tasks rely on similar segments of the user information, we can decompose the target recommendation problem into systematic components that can be derived from the source recommendations, and idiosyncratic components that are user-specific and cannot be derived from the source, but have significantly lower dimensionality. We propose an explore-then-refine approach to learning and utilizing this decomposition; then using ideas from perturbation theory and statistical concentration of measure, we prove our algorithm achieves regret comparable to a strong skyline that has full knowledge of the source and target transformations. We also consider a generalization of our algorithm to a model with many simultaneous targets and no source. Our methods obtain superior empirical results on synthetic benchmarks.