Abstract:We show that the Adaptive Greedy algorithm of Golovin and Krause (2011) achieves an approximation bound of $(\ln (Q/\eta)+1)$ for Stochastic Submodular Cover: here $Q$ is the "goal value" and $\eta$ is the smallest non-zero marginal increase in utility deliverable by an item. (For integer-valued utility functions, we show a bound of $H(Q)$, where $H(Q)$ is the $Q^{th}$ Harmonic number.) Although this bound was claimed by Golovin and Krause in the original version of their paper, the proof was later shown to be incorrect by Nan and Saligrama (2017). The subsequent corrected proof of Golovin and Krause (2017) gives a quadratic bound of $(\ln(Q/\eta) + 1)^2$. Other previous bounds for the problem are $56(\ln(Q/\eta) + 1)$, implied by work of Im et al. (2016) on a related problem, and $k(\ln (Q/\eta)+1)$, due to Deshpande et al. (2016) and Hellerstein and Kletenik (2018), where $k$ is the number of states. Our bound generalizes the well-known $(\ln~m + 1)$ approximation bound on the greedy algorithm for the classical Set Cover problem, where $m$ is the size of the ground set.
Abstract:Many problems in Machine Learning can be modeled as submodular optimization problems. Recent work has focused on stochastic or adaptive versions of these problems. We consider the Scenario Submodular Cover problem, which is a counterpart to the Stochastic Submodular Cover problem studied by Golovin and Krause. In Scenario Submodular Cover, the goal is to produce a cover with minimum expected cost, where the expectation is with respect to an empirical joint distribution, given as input by a weighted sample of realizations. In contrast, in Stochastic Submodular Cover, the variables of the input distribution are assumed to be independent, and the distribution of each variable is given as input. Building on algorithms developed by Cicalese et al. and Golovin and Krause for related problems, we give two approximation algorithms for Scenario Submodular Cover over discrete distributions. The first achieves an approximation factor of O(log Qm), where m is the size of the sample and Q is the goal utility. The second, simpler algorithm achieves an approximation bound of O(log QW), where Q is the goal utility and W is the sum of the integer weights. (Both bounds assume an integer-valued utility function.) Our results yield approximation bounds for other problems involving non-independent distributions that are explicitly specified by their support.
Abstract:We prove a new structural lemma for partial Boolean functions $f$, which we call the seed lemma for DNF. Using the lemma, we give the first subexponential algorithm for proper learning of DNF in Angluin's Equivalence Query (EQ) model. The algorithm has time and query complexity $2^{(\tilde{O}{\sqrt{n}})}$, which is optimal. We also give a new result on certificates for DNF-size, a simple algorithm for properly PAC-learning DNF, and new results on EQ-learning $\log n$-term DNF and decision trees.