Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jing Lei

Specific multi-emitter identification via multi-label learning

Sep 26, 2025

Yuhao Chen, Boxiang He, Shilian Wang, Jing Lei

Abstract:Specific emitter identification leverages hardware-induced impairments to uniquely determine a specific transmitter. However, existing approaches fail to address scenarios where signals from multiple emitters overlap. In this paper, we propose a specific multi-emitter identification (SMEI) method via multi-label learning to determine multiple transmitters. Specifically, the multi-emitter fingerprint extractor is designed to mitigate the mutual interference among overlapping signals. Then, the multi-emitter decision maker is proposed to assign the all emitter identification using the previous extracted fingerprint. Experimental results demonstrate that, compared with baseline approach, the proposed SMEI scheme achieves comparable identification accuracy under various overlapping conditions, while operating at significantly lower complexity. The significance of this paper is to identify multiple emitters from overlapped signal with a low complexity.

Via

Access Paper or Ask Questions

StablePCA: Learning Shared Representations across Multiple Sources via Minimax Optimization

May 02, 2025

Zhenyu Wang, Molei Liu, Jing Lei, Francis Bach, Zijian Guo

Abstract:When synthesizing multisource high-dimensional data, a key objective is to extract low-dimensional feature representations that effectively approximate the original features across different sources. Such general feature extraction facilitates the discovery of transferable knowledge, mitigates systematic biases such as batch effects, and promotes fairness. In this paper, we propose Stable Principal Component Analysis (StablePCA), a novel method for group distributionally robust learning of latent representations from high-dimensional multi-source data. A primary challenge in generalizing PCA to the multi-source regime lies in the nonconvexity of the fixed rank constraint, rendering the minimax optimization nonconvex. To address this challenge, we employ the Fantope relaxation, reformulating the problem as a convex minimax optimization, with the objective defined as the maximum loss across sources. To solve the relaxed formulation, we devise an optimistic-gradient Mirror Prox algorithm with explicit closed-form updates. Theoretically, we establish the global convergence of the Mirror Prox algorithm, with the convergence rate provided from the optimization perspective. Furthermore, we offer practical criteria to assess how closely the solution approximates the original nonconvex formulation. Through extensive numerical experiments, we demonstrate StablePCA's high accuracy and efficiency in extracting robust low-dimensional representations across various finite-sample scenarios.

Via

Access Paper or Ask Questions

Winners with Confidence: Discrete Argmin Inference with an Application to Model Selection

Aug 04, 2024

Tianyu Zhang, Hao Lee, Jing Lei

Figure 1 for Winners with Confidence: Discrete Argmin Inference with an Application to Model Selection

Figure 2 for Winners with Confidence: Discrete Argmin Inference with an Application to Model Selection

Figure 3 for Winners with Confidence: Discrete Argmin Inference with an Application to Model Selection

Figure 4 for Winners with Confidence: Discrete Argmin Inference with an Application to Model Selection

Abstract:We study the problem of finding the index of the minimum value of a vector from noisy observations. This problem is relevant in population/policy comparison, discrete maximum likelihood, and model selection. We develop a test statistic that is asymptotically normal, even in high-dimensional settings and with potentially many ties in the population mean vector, by integrating concepts and tools from cross-validation and differential privacy. The key technical ingredient is a central limit theorem for globally dependent data. We also propose practical ways to select the tuning parameter that adapts to the signal landscape.

Via

Access Paper or Ask Questions

Online Estimation with Rolling Validation: Adaptive Nonparametric Estimation with Stream Data

Oct 18, 2023

Tianyu Zhang, Jing Lei

Abstract:Online nonparametric estimators are gaining popularity due to their efficient computation and competitive generalization abilities. An important example includes variants of stochastic gradient descent. These algorithms often take one sample point at a time and instantly update the parameter estimate of interest. In this work we consider model selection and hyperparameter tuning for such online algorithms. We propose a weighted rolling-validation procedure, an online variant of leave-one-out cross-validation, that costs minimal extra computation for many typical stochastic gradient descent estimators. Similar to batch cross-validation, it can boost base estimators to achieve a better, adaptive convergence rate. Our theoretical analysis is straightforward, relying mainly on some general statistical stability assumptions. The simulation study underscores the significance of diverging weights in rolling validation in practice and demonstrates its sensitivity even when there is only a slim difference between candidate estimators.

Via

Access Paper or Ask Questions

Detecting Errors in Numerical Data via any Regression Model

Jun 03, 2023

Hang Zhou, Jonas Mueller, Mayank Kumar, Jane-Ling Wang, Jing Lei

Abstract:Noise plagues many numerical datasets, where the recorded values in the data may fail to match the true underlying values due to reasons including: erroneous sensors, data entry/processing mistakes, or imperfect human estimates. Here we consider estimating which data values are incorrect along a numerical column. We present a model-agnostic approach that can utilize any regressor (i.e. statistical or machine learning model) which was fit to predict values in this column based on the other variables in the dataset. By accounting for various uncertainties, our approach distinguishes between genuine anomalies and natural data fluctuations, conditioned on the available information in the dataset. We establish theoretical guarantees for our method and show that other approaches like conformal inference struggle to detect errors. We also contribute a new error detection benchmark involving 5 regression datasets with real-world numerical errors (for which the true values are also known). In this benchmark and additional simulation studies, our method identifies incorrect values with better precision/recall than other approaches.

Via

Access Paper or Ask Questions

A Novel K-Repetition Design for SCMA

May 17, 2022

Ke Lai, Zilong Liu, Jing Lei, Lei Wen, Gaojie Chen, Pei Xiao

Figure 1 for A Novel K-Repetition Design for SCMA

Figure 2 for A Novel K-Repetition Design for SCMA

Figure 3 for A Novel K-Repetition Design for SCMA

Figure 4 for A Novel K-Repetition Design for SCMA

Abstract:This work presents a novel K-Repetition based HARQ scheme for LDPC coded uplink SCMA by employing a network coding (NC) principle to encode different packets, where K-Repetition is an emerging technique (recommended in 3GPP Release 15) for enhanced reliability and reduced latency in future massive machine-type communication. Such a scheme is referred to as the NC aided K-repetition SCMA (NCK-SCMA). We introduce a joint iterative detection algorithm for improved detection of the data from the proposed LDPC coded NCKSCMA systems. Simulation results demonstrate the benefits of NCK-SCMA with higher throughput and improved reliability over the conventional K-Repetition SCMA.

* 6 pages, 6 figures

Via

Access Paper or Ask Questions

Analyzing Uplink Grant-free Sparse Code Multiple Access System in Massive IoT Networks

Mar 18, 2021

Ke Lai, Jing Lei, Yansha Deng, Lei Wen, Gaojie Chen

Figure 1 for Analyzing Uplink Grant-free Sparse Code Multiple Access System in Massive IoT Networks

Figure 2 for Analyzing Uplink Grant-free Sparse Code Multiple Access System in Massive IoT Networks

Figure 3 for Analyzing Uplink Grant-free Sparse Code Multiple Access System in Massive IoT Networks

Figure 4 for Analyzing Uplink Grant-free Sparse Code Multiple Access System in Massive IoT Networks

Abstract:Grant-free sparse code multiple access (GF-SCMA) is considered to be a promising multiple access candidate for future wireless networks. In this paper, we focus on characterizing the performance of uplink GF-SCMA schemes in a network with ubiquitous connections, such as the Internet of Things (IoT) networks. To provide a tractable approach to evaluate the performance of GF-SCMA, we first develop a theoretical model taking into account the property of multi-user detection (MUD) in the SCMA system. We then analyze the error rate performance of GF-SCMA in the case of codebook collision to investigate the reliability of GF-SCMA when reusing codebook in massive IoT networks. For performance evaluation, accurate approximations for both success probability and average symbol error probability (ASEP) are derived. To elaborate further, we utilize the analytical results to discuss the impact of codeword sparse degree in GFSCMA. After that, we conduct a comparative study between SCMA and its variant, dense code multiple access (DCMA), with GF transmission to offer insights into the effectiveness of these two schemes. This facilitates the GF-SCMA system design in practical implementation. Simulation results show that denser codebooks can help to support more UEs and increase the reliability of data transmission in a GF-SCMA network. Moreover, a higher success probability can be achieved by GFSCMA with denser UE deployment at low detection thresholds since SCMA can achieve overloading gain.

Via

Access Paper or Ask Questions

Gradient-based Sparse Principal Component Analysis with Extensions to Online Learning

Nov 19, 2019

Yixuan Qiu, Jing Lei, Kathryn Roeder

Figure 1 for Gradient-based Sparse Principal Component Analysis with Extensions to Online Learning

Figure 2 for Gradient-based Sparse Principal Component Analysis with Extensions to Online Learning

Figure 3 for Gradient-based Sparse Principal Component Analysis with Extensions to Online Learning

Figure 4 for Gradient-based Sparse Principal Component Analysis with Extensions to Online Learning

Abstract:Sparse principal component analysis (PCA) is an important technique for dimensionality reduction of high-dimensional data. However, most existing sparse PCA algorithms are based on non-convex optimization, which provide little guarantee on the global convergence. Sparse PCA algorithms based on a convex formulation, for example the Fantope projection and selection (FPS), overcome this difficulty, but are computationally expensive. In this work we study sparse PCA based on the convex FPS formulation, and propose a new algorithm that is computationally efficient and applicable to large and high-dimensional data sets. Nonasymptotic and explicit bounds are derived for both the optimization error and the statistical accuracy, which can be used for testing and inference problems. We also extend our algorithm to online learning problems, where data are obtained in a streaming fashion. The proposed algorithm is applied to high-dimensional gene expression data for the detection of functional gene groups.

Via

Access Paper or Ask Questions

Convergence and Concentration of Empirical Measures under Wasserstein Distance in Unbounded Functional Spaces

Apr 27, 2018

Jing Lei

Abstract:We provide upper bounds of the expected Wasserstein distance between a probability measure and its empirical version, generalizing recent results for finite dimensional Euclidean spaces and bounded functional spaces. Such a generalization can cover Euclidean spaces with large dimensionality, with the optimal dependence on the dimensionality. Our method also covers the important case of Gaussian processes in separable Hilbert spaces, with rate-optimal upper bounds for functional data distributions whose coordinates decay geometrically or polynomially. Moreover, our bounds of the expected value can be combined with mean-concentration results to yield improved exponential tail probability bounds for the Wasserstein error of empirical measures under a Bernstein-type tail condition.

* 32 pages

Via

Access Paper or Ask Questions

Cross-Validation with Confidence

Dec 22, 2017

Jing Lei

Figure 1 for Cross-Validation with Confidence

Figure 2 for Cross-Validation with Confidence

Figure 3 for Cross-Validation with Confidence

Figure 4 for Cross-Validation with Confidence

Abstract:Cross-validation is one of the most popular model selection methods in statistics and machine learning. Despite its wide applicability, traditional cross validation methods tend to select overfitting models, due to the ignorance of the uncertainty in the testing sample. We develop a new, statistically principled inference tool based on cross-validation that takes into account the uncertainty in the testing sample. This new method outputs a set of highly competitive candidate models containing the best one with guaranteed probability. As a consequence, our method can achieve consistent variable selection in a classical linear regression setting, for which existing cross-validation methods require unconventional split ratios. When used for regularizing tuning parameter selection, the method can provide a further trade-off between prediction accuracy and model interpretability. We demonstrate the performance of the proposed method in several simulated and real data examples.

* 35 pages, 5 figures

Via

Access Paper or Ask Questions