Abstract:Face clustering can provide pseudo-labels to the massive unlabeled face data and improve the performance of different face recognition models. The existing clustering methods generally aggregate the features within subgraphs that are often implemented based on a uniform threshold or a learned cutoff position. This may reduce the recall of subgraphs and hence degrade the clustering performance. This work proposed an efficient neighborhood-aware subgraph adjustment method that can significantly reduce the noise and improve the recall of the subgraphs, and hence can drive the distant nodes to converge towards the same centers. More specifically, the proposed method consists of two components, i.e. face embeddings enhancement using the embeddings from neighbors, and enclosed subgraph construction of node pairs for structural information extraction. The embeddings are combined to predict the linkage probabilities for all node pairs to replace the cosine similarities to produce new subgraphs that can be further used for aggregation of GCNs or other clustering methods. The proposed method is validated through extensive experiments against a range of clustering solutions using three benchmark datasets and numerical results confirm that it outperforms the SOTA solutions in terms of generalization capability.
Abstract:In this paper, we develop a general theory of truncated inverse binomial sampling. In this theory, the fixed-size sampling and inverse binomial sampling are accommodated as special cases. In particular, the classical Chernoff-Hoeffding bound is an immediate consequence of the theory. Moreover, we propose a rigorous and efficient method for probability estimation, which is an adaptive Monte Carlo estimation method based on truncated inverse binomial sampling. Our proposed method of probability estimation can be orders of magnitude more efficient as compared to existing methods in literature and widely used software.
Abstract:We derive simple concentration inequalities for bounded random vectors, which generalize Hoeffding's inequalities for bounded scalar random variables. As applications, we apply the general results to multinomial and Dirichlet distributions to obtain multivariate concentration inequalities.
Abstract:We propose a new approach for deriving probabilistic inequalities based on bounding likelihood ratios. We demonstrate that this approach is more general and powerful than the classical method frequently used for deriving concentration inequalities such as Chernoff bounds. We discover that the proposed approach is inherently related to statistical concepts such as monotone likelihood ratio, maximum likelihood, and the method of moments for parameter estimation. A connection between the proposed approach and the large deviation theory is also established. We show that, without using moment generating functions, tightest possible concentration inequalities may be readily derived by the proposed approach. We have derived new concentration inequalities using the proposed approach, which cannot be obtained by the classical approach based on moment generating functions.
Abstract:A large class of problems in sciences and engineering can be formulated as the general problem of constructing random intervals with pre-specified coverage probabilities for the mean. Wee propose a general approach for statistical inference of mean values based on accumulated observational data. We show that the construction of such random intervals can be accomplished by comparing the endpoints of random intervals with confidence sequences for the mean. Asymptotic results are obtained for such sequential methods.
Abstract:We first review existing sequential methods for estimating a binomial proportion. Afterward, we propose a new family of group sequential sampling schemes for estimating a binomial proportion with prescribed margin of error and confidence level. In particular, we establish the uniform controllability of coverage probability and the asymptotic optimality for such a family of sampling schemes. Our theoretical results establish the possibility that the parameters of this family of sampling schemes can be determined so that the prescribed level of confidence is guaranteed with little waste of samples. Analytic bounds for the cumulative distribution functions and expectations of sample numbers are derived. Moreover, we discuss the inherent connection of various sampling schemes. Numerical issues are addressed for improving the accuracy and efficiency of computation. Computational experiments are conducted for comparing sampling schemes. Illustrative examples are given for applications in clinical trials.
Abstract:In this paper, we have established a unified framework of multistage parameter estimation. We demonstrate that a wide variety of statistical problems such as fixed-sample-size interval estimation, point estimation with error control, bounded-width confidence intervals, interval estimation following hypothesis testing, construction of confidence sequences, can be cast into the general framework of constructing sequential random intervals with prescribed coverage probabilities. We have developed exact methods for the construction of such sequential random intervals in the context of multistage sampling. In particular, we have established inclusion principle and coverage tuning techniques to control and adjust the coverage probabilities of sequential random intervals. We have obtained concrete sampling schemes which are unprecedentedly efficient in terms of sampling effort as compared to existing procedures.
Abstract:In this paper, we have established a general framework of multistage hypothesis tests which applies to arbitrarily many mutually exclusive and exhaustive composite hypotheses. Within the new framework, we have constructed specific multistage tests which rigorously control the risk of committing decision errors and are more efficient than previous tests in terms of average sample number and the number of sampling operations. Without truncation, the sample numbers of our testing plans are absolutely bounded.
Abstract:In this paper, we propose new sequential estimation methods based on inclusion principle. The main idea is to reformulate the estimation problems as constructing sequential random intervals and use confidence sequences to control the associated coverage probabilities. In contrast to existing asymptotic sequential methods, our estimation procedures rigorously guarantee the pre-specified levels of confidence.
Abstract:In this article, we derive a new generalization of Chebyshev inequality for random vectors. We demonstrate that the new generalization is much less conservative than the classical generalization.