Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shubham Anand Jain

Sequential Community Mode Estimation

Nov 16, 2021

Shubham Anand Jain, Shreyas Goenka, Divyam Bapna, Nikhil Karamchandani, Jayakrishnan Nair

Figure 1 for Sequential Community Mode Estimation

Figure 2 for Sequential Community Mode Estimation

Figure 3 for Sequential Community Mode Estimation

Figure 4 for Sequential Community Mode Estimation

Abstract:We consider a population, partitioned into a set of communities, and study the problem of identifying the largest community within the population via sequential, random sampling of individuals. There are multiple sampling domains, referred to as \emph{boxes}, which also partition the population. Each box may consist of individuals of different communities, and each community may in turn be spread across multiple boxes. The learning agent can, at any time, sample (with replacement) a random individual from any chosen box; when this is done, the agent learns the community the sampled individual belongs to, and also whether or not this individual has been sampled before. The goal of the agent is to minimize the probability of mis-identifying the largest community in a \emph{fixed budget} setting, by optimizing both the sampling strategy as well as the decision rule. We propose and analyse novel algorithms for this problem, and also establish information theoretic lower bounds on the probability of error under any algorithm. In several cases of interest, the exponential decay rates of the probability of error under our algorithms are shown to be optimal up to constant factors. The proposed algorithms are further validated via simulations on real-world datasets.

* Presented in part at Performance'21. Full version in Elsevier Performance Evaluation, Dec. 21

Via

Access Paper or Ask Questions

PAC Mode Estimation using PPR Martingale Confidence Sequences

Sep 10, 2021

Shubham Anand Jain, Sanit Gupta, Denil Mehta, Inderjeet Jayakumar Nair, Rohan Shah, Jian Vora, Sushil Khyalia, Sourav Das, Vinay J. Ribeiro, Shivaram Kalyanakrishnan

Figure 1 for PAC Mode Estimation using PPR Martingale Confidence Sequences

Figure 2 for PAC Mode Estimation using PPR Martingale Confidence Sequences

Figure 3 for PAC Mode Estimation using PPR Martingale Confidence Sequences

Figure 4 for PAC Mode Estimation using PPR Martingale Confidence Sequences

Abstract:We consider the problem of correctly identifying the mode of a discrete distribution $\mathcal{P}$ with sufficiently high probability by observing a sequence of i.i.d. samples drawn according to $\mathcal{P}$. This problem reduces to the estimation of a single parameter when $\mathcal{P}$ has a support set of size $K = 2$. Noting the efficiency of prior-posterior-ratio (PPR) martingale confidence sequences for handling this special case, we propose a generalisation to mode estimation, in which $\mathcal{P}$ may take $K \geq 2$ values. We observe that the "one-versus-one" principle yields a more efficient generalisation than the "one-versus-rest" alternative. Our resulting stopping rule, denoted PPR-ME, is optimal in its sample complexity up to a logarithmic factor. Moreover, PPR-ME empirically outperforms several other competing approaches for mode estimation. We demonstrate the gains offered by PPR-ME in two practical applications: (1) sample-based forecasting of the winner in indirect election systems, and (2) efficient verification of smart contracts in permissionless blockchains.

* 30 pages, 2 figures

Via

Access Paper or Ask Questions