Abstract:Reinforcement Learning (RL) is a widely researched area in artificial intelligence that focuses on teaching agents decision-making through interactions with their environment. A key subset includes stochastic multi-armed bandit (MAB) and continuum-armed bandit (SCAB) problems, which model sequential decision-making under uncertainty. This review outlines the foundational models and assumptions of bandit problems, explores non-asymptotic theoretical tools like concentration inequalities and minimax regret bounds, and compares frequentist and Bayesian algorithms for managing exploration-exploitation trade-offs. We also extend the discussion to $K$-armed contextual bandits and SCAB, examining their methodologies, regret analyses, and discussing the relation between the SCAB problems and the functional data analysis. Finally, we highlight recent advances and ongoing challenges in the field.
Abstract:Many real applications of bandits have sparse non-zero rewards, leading to slow learning rates. A careful distribution modeling that utilizes problem-specific structures is known as critical to estimation efficiency in the statistics literature, yet is under-explored in bandits. To fill the gap, we initiate the study of zero-inflated bandits, where the reward is modeled as a classic semi-parametric distribution called zero-inflated distribution. We carefully design Upper Confidence Bound (UCB) and Thompson Sampling (TS) algorithms for this specific structure. Our algorithms are suitable for a very general class of reward distributions, operating under tail assumptions that are considerably less stringent than the typical sub-Gaussian requirements. Theoretically, we derive the regret bounds for both the UCB and TS algorithms for multi-armed bandit, showing that they can achieve rate-optimal regret when the reward distribution is sub-Gaussian. The superior empirical performance of the proposed methods is shown via extensive numerical studies.
Abstract:In non-asymptotic statistical inferences, variance-type parameters of sub-Gaussian distributions play a crucial role. However, direct estimation of these parameters based on the empirical moment generating function (MGF) is infeasible. To this end, we recommend using a sub-Gaussian intrinsic moment norm [Buldygin and Kozachenko (2000), Theorem 1.3] through maximizing a series of normalized moments. Importantly, the recommended norm can not only recover the exponential moment bounds for the corresponding MGFs, but also lead to tighter Hoeffding's sub-Gaussian concentration inequalities. In practice, {\color{black} we propose an intuitive way of checking sub-Gaussian data with a finite sample size by the sub-Gaussian plot}. Intrinsic moment norm can be robustly estimated via a simple plug-in approach. Our theoretical results are applied to non-asymptotic analysis, including the multi-armed bandit.
Abstract:Despite the great interest in the bandit problem, designing efficient algorithms for complex models remains challenging, as there is typically no analytical way to quantify uncertainty. In this paper, we propose Multiplier Bootstrap-based Exploration (MBE), a novel exploration strategy that is applicable to any reward model amenable to weighted loss minimization. We prove both instance-dependent and instance-independent rate-optimal regret bounds for MBE in sub-Gaussian multi-armed bandits. With extensive simulation and real data experiments, we show the generality and adaptivity of MBE.
Abstract:This paper studies the consistency and statistical inference of simulated Ising models in the high dimensional background. Our estimators are based on the Markov chain Monte Carlo maximum likelihood estimation (MCMC-MLE) method penalized by the Elastic-net. Under mild conditions that ensure a specific convergence rate of MCMC method, the $\ell_{1}$ consistency of Elastic-net-penalized MCMC-MLE is proved. We further propose a decorrelated score test based on the decorrelated score function and prove the asymptotic normality of the score function without the influence of many nuisance parameters under the assumption that accelerates the convergence of the MCMC method. The one-step estimator for a single parameter of interest is purposed by linearizing the decorrelated score function to solve its root, as well as its normality and confidence interval for the true value, therefore, be established. Finally, we use different algorithms to control the false discovery rate (FDR) via traditional p-values and novel e-values.
Abstract:For the differential privacy under the sub-Gamma noise, we derive the asymptotic properties of a class of network models with binary values with a general link function. In this paper, we release the degree sequences of the binary networks under a general noisy mechanism with the discrete Laplace mechanism as a special case. We establish the asymptotic result including both consistency and asymptotically normality of the parameter estimator when the number of parameters goes to infinity in a class of network models. Simulations and a real data example are provided to illustrate asymptotic results.
Abstract:This paper studies the non-asymptotic merits of the double $\ell_1$-regularized for heterogeneous overdispersed count data via negative binomial regressions. Under the restricted eigenvalue conditions, we prove the oracle inequalities for Lasso estimators of two partial regression coefficients for the first time, using concentration inequalities of empirical processes. Furthermore, derived from the oracle inequalities, the consistency and convergence rate for the estimators are the theoretical guarantees for further statistical inference. Finally, both simulations and a real data analysis demonstrate that the new methods are effective.
Abstract:Arising in high-dimensional probability, non-asymptotic concentration inequalities play an essential role in the finite-sample theory of machine learning and high-dimensional statistics. In this article, we obtain a sharper and constants-specified concentration inequality for the summation of independent sub-Weibull random variables, which leads to a mixture of two tails: sub-Gaussian for small deviations and sub-Weibull for large deviations (from mean). These bounds improve existing bounds with sharper constants. In the application of random matrices, we derive non-asymptotic versions of Bai-Yin's theorem for sub-Weibull entries and it extends the previous result in terms of sub-Gaussian entries. In the application of negative binomial regressions, we gives the $\ell_2$-error of the estimated coefficients when covariate vector $X$ is sub-Weibull distributed with sparse structures, which is a new result for negative binomial regressions.
Abstract:Computed tomography is widely used to examine internal structures in a non-destructive manner. To obtain high-quality reconstructions, one typically has to acquire a densely sampled trajectory to avoid angular undersampling. However, many scenarios require a sparse-view measurement leading to streak-artifacts if unaccounted for. Current methods do not make full use of the domain-specific information, and hence fail to provide reliable reconstructions for highly undersampled data. We present a novel framework for sparse-view tomography by decoupling the reconstruction into two steps: First, we overcome its ill-posedness using a super-resolution network, SIN, trained on the sparse projections. The intermediate result allows for a closed-form tomographic reconstruction with preserved details and highly reduced streak-artifacts. Second, a refinement network, PRN, trained on the reconstructions reduces any remaining artifacts. We further propose a light-weight variant of the perceptual-loss that enhances domain-specific information, boosting restoration accuracy. Our experiments demonstrate an improvement over current solutions by 4 dB.
Abstract:This paper aims to build an estimate of an unknown density of the data with measurement error as a linear combination of functions of a dictionary. Inspired by penalization approach, we propose the weighted Elastic-net penalized minimal $L_2$-distance method for sparse coefficients estimation, where the weights adaptively coming from sharp concentration inequalities. The optimal weighted tuning parameters are obtained by the first-order conditions holding with high-probability. Under local coherence or minimal eigenvalue assumptions, non-asymptotical oracle inequalities are derived. These theoretical results are transposed to obtain the support recovery with high-probability. Then, the issue of calibrating these procedures is studied by some numerical experiments for discrete and continuous distributions, it shows the significant improvement obtained by our procedure when compared with other conventional approaches. Finally, the application is performed for a meteorology data set. It shows that our method has potency and superiority of detecting the shape of multi-mode density compared with other conventional approaches.