Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ankit Garg

Learning Arithmetic Formulas in the Presence of Noise: A General Framework and Applications to Unsupervised Learning

Nov 13, 2023

Pritam Chandra, Ankit Garg, Neeraj Kayal, Kunal Mittal, Tanmay Sinha

Abstract:We present a general framework for designing efficient algorithms for unsupervised learning problems, such as mixtures of Gaussians and subspace clustering. Our framework is based on a meta algorithm that learns arithmetic circuits in the presence of noise, using lower bounds. This builds upon the recent work of Garg, Kayal and Saha (FOCS 20), who designed such a framework for learning arithmetic circuits without any noise. A key ingredient of our meta algorithm is an efficient algorithm for a novel problem called Robust Vector Space Decomposition. We show that our meta algorithm works well when certain matrices have sufficiently large smallest non-zero singular values. We conjecture that this condition holds for smoothed instances of our problems, and thus our framework would yield efficient algorithms for these problems in the smoothed setting.

* 85 pages, comments welcome

Via

Access Paper or Ask Questions

Near-Optimal Lower Bounds For Convex Optimization For All Orders of Smoothness

Dec 02, 2021

Ankit Garg, Robin Kothari, Praneeth Netrapalli, Suhail Sherif

Abstract:We study the complexity of optimizing highly smooth convex functions. For a positive integer $p$, we want to find an $\epsilon$-approximate minimum of a convex function $f$, given oracle access to the function and its first $p$ derivatives, assuming that the $p$th derivative of $f$ is Lipschitz. Recently, three independent research groups (Jiang et al., PLMR 2019; Gasnikov et al., PLMR 2019; Bubeck et al., PLMR 2019) developed a new algorithm that solves this problem with $\tilde{O}(1/\epsilon^{\frac{2}{3p+1}})$ oracle calls for constant $p$. This is known to be optimal (up to log factors) for deterministic algorithms, but known lower bounds for randomized algorithms do not match this bound. We prove a new lower bound that matches this bound (up to log factors), and holds not only for randomized algorithms, but also for quantum algorithms.

* To be presented as a spotlight talk at NeurIPS 2021

Via

Access Paper or Ask Questions

Learning sums of powers of low-degree polynomials in the non-degenerate case

Apr 15, 2020

Ankit Garg, Neeraj Kayal, Chandan Saha

Abstract:We develop algorithms for writing a polynomial as sums of powers of low degree polynomials. Consider an $n$-variate degree-$d$ polynomial $f$ which can be written as $$f = c_1Q_1^{m} + \ldots + c_s Q_s^{m},$$ where each $c_i\in \mathbb{F}^{\times}$, $Q_i$ is a homogeneous polynomial of degree $t$, and $t m = d$. In this paper, we give a $\text{poly}((ns)^t)$-time learning algorithm for finding the $Q_i$'s given (black-box access to) $f$, if the $Q_i's$ satisfy certain non-degeneracy conditions and $n$ is larger than $d^2$. The set of degenerate $Q_i$'s (i.e., inputs for which the algorithm does not work) form a non-trivial variety and hence if the $Q_i$'s are chosen according to any reasonable (full-dimensional) distribution, then they are non-degenerate with high probability (if $s$ is not too large). Our algorithm is based on a scheme for obtaining a learning algorithm for an arithmetic circuit model from a lower bound for the same model, provided certain non-degeneracy conditions hold. The scheme reduces the learning problem to the problem of decomposing two vector spaces under the action of a set of linear operators, where the spaces and the operators are derived from the input circuit and the complexity measure used in a typical lower bound proof. The non-degeneracy conditions are certain restrictions on how the spaces decompose.

Via

Access Paper or Ask Questions

Communication Lower Bounds for Statistical Estimation Problems via a Distributed Data Processing Inequality

May 10, 2016

Mark Braverman, Ankit Garg, Tengyu Ma, Huy L. Nguyen, David P. Woodruff

Abstract:We study the tradeoff between the statistical error and communication cost of distributed statistical estimation problems in high dimensions. In the distributed sparse Gaussian mean estimation problem, each of the $m$ machines receives $n$ data points from a $d$-dimensional Gaussian distribution with unknown mean $\theta$ which is promised to be $k$-sparse. The machines communicate by message passing and aim to estimate the mean $\theta$. We provide a tight (up to logarithmic factors) tradeoff between the estimation error and the number of bits communicated between the machines. This directly leads to a lower bound for the distributed \textit{sparse linear regression} problem: to achieve the statistical minimax error, the total communication is at least $\Omega(\min\{n,d\}m)$, where $n$ is the number of observations that each machine receives and $d$ is the ambient dimension. These lower results improve upon [Sha14,SD'14] by allowing multi-round iterative communication model. We also give the first optimal simultaneous protocol in the dense case for mean estimation. As our main technique, we prove a \textit{distributed data processing inequality}, as a generalization of usual data processing inequalities, which might be of independent interest and useful for other problems.

* To appear at STOC 2016. Fixed typos in theorem 4.5 and incorporated reviewers' suggestions

Via

Access Paper or Ask Questions

On Communication Cost of Distributed Statistical Estimation and Dimensionality

Nov 08, 2014

Ankit Garg, Tengyu Ma, Huy L. Nguyen

Abstract:We explore the connection between dimensionality and communication cost in distributed learning problems. Specifically we study the problem of estimating the mean $\vec{\theta}$ of an unknown $d$ dimensional gaussian distribution in the distributed setting. In this problem, the samples from the unknown distribution are distributed among $m$ different machines. The goal is to estimate the mean $\vec{\theta}$ at the optimal minimax rate while communicating as few bits as possible. We show that in this setting, the communication cost scales linearly in the number of dimensions i.e. one needs to deal with different dimensions individually. Applying this result to previous lower bounds for one dimension in the interactive setting \cite{ZDJW13} and to our improved bounds for the simultaneous setting, we prove new lower bounds of $\Omega(md/\log(m))$ and $\Omega(md)$ for the bits of communication needed to achieve the minimax squared loss, in the interactive and simultaneous settings respectively. To complement, we also demonstrate an interactive protocol achieving the minimax squared loss with $O(md)$ bits of communication, which improves upon the simple simultaneous protocol by a logarithmic factor. Given the strong lower bounds in the general setting, we initiate the study of the distributed parameter estimation problems with structured parameters. Specifically, when the parameter is promised to be $s$-sparse, we show a simple thresholding based protocol that achieves the same squared loss while saving a $d/s$ factor of communication. We conjecture that the tradeoff between communication and squared loss demonstrated by this protocol is essentially optimal up to logarithmic factor.

* to appear at NIPS'14 with oral presentation

Via

Access Paper or Ask Questions

Close Clustering Based Automated Color Image Annotation

Aug 02, 2010

Ankit Garg, Rahul Dwivedi, Krishna Asawa

Abstract:Most image-search approaches today are based on the text based tags associated with the images which are mostly human generated and are subject to various kinds of errors. The results of a query to the image database thus can often be misleading and may not satisfy the requirements of the user. In this work we propose our approach to automate this tagging process of images, where image results generated can be fine filtered based on a probabilistic tagging mechanism. We implement a tool which helps to automate the tagging process by maintaining a training database, wherein the system is trained to identify certain set of input images, the results generated from which are used to create a probabilistic tagging mechanism. Given a certain set of segments in an image it calculates the probability of presence of particular keywords. This probability table is further used to generate the candidate tags for input images.

Via

Access Paper or Ask Questions