Abstract:Despite the considerable success of Bregman proximal-type algorithms, such as mirror descent, in machine learning, a critical question remains: Can existing stationarity measures, often based on Bregman divergence, reliably distinguish between stationary and non-stationary points? In this paper, we present a groundbreaking finding: All existing stationarity measures necessarily imply the existence of spurious stationary points. We further establish an algorithmic independent hardness result: Bregman proximal-type algorithms are unable to escape from a spurious stationary point in finite steps when the initial point is unfavorable, even for convex problems. Our hardness result points out the inherent distinction between Euclidean and Bregman geometries, and introduces both fundamental theoretical and numerical challenges to both machine learning and optimization communities.
Abstract:This work introduces DeepCRF, a deep learning framework designed for channel state information-based radio frequency fingerprinting (CSI-RFF). The considered CSI-RFF is built on micro-CSI, a recently discovered radio-frequency (RF) fingerprint that manifests as micro-signals appearing on the channel state information (CSI) curves of commercial WiFi devices. Micro-CSI facilitates CSI-RFF which is more streamlined and easily implementable compared to existing schemes that rely on raw I/Q samples. The primary challenge resides in the precise extraction of micro-CSI from the inherently fluctuating CSI measurements, a process critical for reliable RFF. The construction of a framework that is resilient to channel variability is essential for the practical deployment of CSI-RFF techniques. DeepCRF addresses this challenge with a thoughtfully trained convolutional neural network (CNN). This network's performance is significantly enhanced by employing effective and strategic data augmentation techniques, which bolster its ability to generalize to novel, unseen channel conditions. Furthermore, DeepCRF incorporates supervised contrastive learning to enhance its robustness against noises. Our evaluations demonstrate that DeepCRF significantly enhances the accuracy of device identification across previously unencountered channels. It outperforms both the conventional model-based methods and standard CNN that lack our specialized training and enhancement strategies.
Abstract:This paper presents a new radiometric fingerprint that is revealed by micro-signals in the channel state information (CSI) curves extracted from commodity Wi-Fi devices. We refer to this new fingerprint as "micro-CSI". Our experiments show that micro-CSI is likely to be caused by imperfections in the radio-frequency circuitry and is present in Wi-Fi 4/5/6 network interface cards (NICs). We conducted further experiments to determine the most effective CSI collection configuration to stabilize micro-CSI. To extract micro-CSI from varying CSI curves, we developed a signal space-based extraction algorithm that effectively separates distortions caused by wireless channels and hardware imperfections under line-of-sight (LoS) scenarios. Finally, we implemented a micro-CSI-based device authentication algorithm that uses the k-Nearest Neighbors (KNN) method to identify 11 COTS Wi-Fi NICs from the same manufacturer in typical indoor environments. Our experimental results demonstrate that the micro-CSI-based authentication algorithm can achieve an average attack detection rate of over 99% with a false alarm rate of 0%.
Abstract:Due to the finite bandwidth of practical wireless systems, one multipath component can manifest itself as a discrete pulse consisting of multiple taps in the digital delay domain. This effect is called channel leakage, which complicates the multipath delay estimation problem. In this paper, we develop a new algorithm to estimate multipath delays of leaked channels by leveraging the knowledge of pulse-shaping functions, which can be used to support fine-grained WiFi sensing applications. Specifically, we express the channel impulse response (CIR) as a linear combination of overcomplete basis vectors corresponding to different delays. Considering the limited number of paths in physical environments, we formulate the multipath delay estimation as a sparse recovery problem. We then propose a sparse Bayesian learning (SBL) method to estimate the sparse vector and determine the number of physical paths and their associated delay parameters from the positions of the nonzero entries in the sparse vector. Simulation results show that our algorithm can accurately determine the number of paths, and achieve superior accuracy in path delay estimation and channel reconstruction compared to two benchmarking schemes.
Abstract:This paper considers a radio-frequency (RF)-based simultaneous localization and source-seeking (SLASS) problem in multi-robot systems, where multiple robots jointly localize themselves and an RF source using distance-only measurements extracted from RF signals and then control themselves to approach the source. We design a Rao-Blackwellized particle filter-based algorithm to realize the joint localization of the robots and the source. We also devise an information-theoretic control policy for the robots to approach the source. In our control policy, we maximize the predicted mutual information between the source position and the distance measurements, conditioned on the robot positions, to incorporate the robot localization uncertainties. A projected gradient ascent method is adopted to solve the mutual information maximization problem. Simulation results show that the proposed SLASS framework outperforms two benchmarks in terms of the root mean square error (RMSE) of the estimated source position and the decline of the distances between the robots and the source, indicating more effective approaching of the robots to the source.
Abstract:Currently, under supervised learning, a model pretrained by a large-scale nature scene dataset and then fine-tuned on a few specific task labeling data is the paradigm that has dominated the knowledge transfer learning. It has reached the status of consensus solution for task-aware model training in remote sensing domain (RSD). Unfortunately, due to different categories of imaging data and stiff challenges of data annotation, there is not a large enough and uniform remote sensing dataset to support large-scale pretraining in RSD. Moreover, pretraining models on large-scale nature scene datasets by supervised learning and then directly fine-tuning on diverse downstream tasks seems to be a crude method, which is easily affected by inevitable labeling noise, severe domain gaps and task-aware discrepancies. Thus, in this paper, considering the self-supervised pretraining and powerful vision transformer (ViT) architecture, a concise and effective knowledge transfer learning strategy called ConSecutive PreTraining (CSPT) is proposed based on the idea of not stopping pretraining in natural language processing (NLP), which can gradually bridge the domain gap and transfer knowledge from the nature scene domain to the RSD. The proposed CSPT also can release the huge potential of unlabeled data for task-aware model training. Finally, extensive experiments are carried out on twelve datasets in RSD involving three types of downstream tasks (e.g., scene classification, object detection and land cover classification) and two types of imaging data (e.g., optical and SAR). The results show that by utilizing the proposed CSPT for task-aware model training, almost all downstream tasks in RSD can outperform the previous method of supervised pretraining-then-fine-tuning and even surpass the state-of-the-art (SOTA) performance without any expensive labeling consumption and careful model design.
Abstract:Interactions between users and videos are the major data source of performing video recommendation. Despite lots of existing recommendation methods, user behaviors on videos, which imply the complex relations between users and videos, are still far from being fully explored. In the paper, we present a model named Sagittarius. Sagittarius adopts a graph convolutional neural network to capture the influence between users and videos. In particular, Sagittarius differentiates between different user behaviors by weighting and fuses the semantics of user behaviors into the embeddings of users and videos. Moreover, Sagittarius combines multiple optimization objectives to learn user and video embeddings and then achieves the video recommendation by the learned user and video embeddings. The experimental results on multiple datasets show that Sagittarius outperforms several state-of-the-art models in terms of recall, unique recall and NDCG.
Abstract:In this paper, we aim at improving the computational efficiency of graph convolutional networks (GCNs) for learning on point clouds. The basic graph convolution that is typically composed of a $K$-nearest neighbor (KNN) search and a multilayer perceptron (MLP) is examined. By mathematically analyzing the operations there, two findings to improve the efficiency of GCNs are obtained. (1) The local geometric structure information of 3D representations propagates smoothly across the GCN that relies on KNN search to gather neighborhood features. This motivates the simplification of multiple KNN searches in GCNs. (2) Shuffling the order of graph feature gathering and an MLP leads to equivalent or similar composite operations. Based on those findings, we optimize the computational procedure in GCNs. A series of experiments show that the optimized networks have reduced computational complexity, decreased memory consumption, and accelerated inference speed while maintaining comparable accuracy for learning on point clouds. Code will be available at \url{https://github.com/ofsoundof/EfficientGCN.git}.
Abstract:We present a new method to capture detailed human motion, sampling more than 1000 unique points on the body. Our method outputs highly accurate 4D (spatio-temporal) point coordinates and, crucially, automatically assigns a unique label to each of the points. The locations and unique labels of the points are inferred from individual 2D input images only, without relying on temporal tracking or any human body shape or skeletal kinematics models. Therefore, our captured point trajectories contain all of the details from the input images, including motion due to breathing, muscle contractions and flesh deformation, and are well suited to be used as training data to fit advanced models of the human body and its motion. The key idea behind our system is a new type of motion capture suit which contains a special pattern with checkerboard-like corners and two-letter codes. The images from our multi-camera system are processed by a sequence of neural networks which are trained to localize the corners and recognize the codes, while being robust to suit stretching and self-occlusions of the body. Our system relies only on standard RGB or monochrome sensors and fully passive lighting and the passive suit, making our method easy to replicate, deploy and use. Our experiments demonstrate highly accurate captures of a wide variety of human poses, including challenging motions such as yoga, gymnastics, or rolling on the ground.
Abstract:Epipolar constraints are at the core of feature matching and depth estimation in current multi-person multi-camera 3D human pose estimation methods. Despite the satisfactory performance of this formulation in sparser crowd scenes, its effectiveness is frequently challenged under denser crowd circumstances mainly due to two sources of ambiguity. The first is the mismatch of human joints resulting from the simple cues provided by the Euclidean distances between joints and epipolar lines. The second is the lack of robustness from the naive formulation of the problem as a least squares minimization. In this paper, we depart from the multi-person 3D pose estimation formulation, and instead reformulate it as crowd pose estimation. Our method consists of two key components: a graph model for fast cross-view matching, and a maximum a posteriori (MAP) estimator for the reconstruction of the 3D human poses. We demonstrate the effectiveness and superiority of our proposed method on four benchmark datasets.