Abstract:Radio Frequency (RF) device fingerprinting has been recognized as a potential technology for enabling automated wireless device identification and classification. However, it faces a key challenge due to the domain shift that could arise from variations in the channel conditions and environmental settings, potentially degrading the accuracy of RF-based device classification when testing and training data is collected in different domains. This paper introduces a novel solution that leverages contrastive learning to mitigate this domain shift problem. Contrastive learning, a state-of-the-art self-supervised learning approach from deep learning, learns a distance metric such that positive pairs are closer (i.e. more similar) in the learned metric space than negative pairs. When applied to RF fingerprinting, our model treats RF signals from the same transmission as positive pairs and those from different transmissions as negative pairs. Through experiments on wireless and wired RF datasets collected over several days, we demonstrate that our contrastive learning approach captures domain-invariant features, diminishing the effects of domain-specific variations. Our results show large and consistent improvements in accuracy (10.8\% to 27.8\%) over baseline models, thus underscoring the effectiveness of contrastive learning in improving device classification under domain shift.
Abstract:Next-generation networks aim for comprehensive connectivity, interconnecting humans, machines, devices, and systems seamlessly. This interconnectivity raises concerns about privacy and security, given the potential network-wide impact of a single compromise. To address this challenge, the Zero Trust (ZT) paradigm emerges as a key method for safeguarding network integrity and data confidentiality. This work introduces EPS-CNN, a novel deep-learning-based wireless device identification framework designed to serve as the device authentication layer within the ZT architecture, with a focus on resource-constrained IoT devices. At the core of EPS-CNN, a Convolutional Neural Network (CNN) is utilized to generate the device identity from a unique RF signal representation, known as the Double-Sided Envelope Power Spectrum (EPS), which effectively captures the device-specific hardware characteristics while ignoring device-unrelated information. Experimental evaluations show that the proposed framework achieves over 99%, 93%, and 95% testing accuracy when tested in same-domain (day, location, and channel), cross-day, and cross-location scenarios, respectively. Our findings demonstrate the superiority of the proposed framework in enhancing the accuracy, robustness, and adaptability of deep learning-based methods, thus offering a pioneering solution for enabling ZT IoT device identification.
Abstract:Deep learning-enabled device fingerprinting has proven efficient in enabling automated identification and authentication of transmitting devices. It does so by leveraging the transmitters' unique features that are inherent to hardware impairments caused during manufacturing to extract device-specific signatures that can be exploited to uniquely distinguish and separate between (identical) devices. Though shown to achieve promising performances, hardware fingerprinting approaches are known to suffer greatly when the training data and the testing data are generated under different channels conditions that often change when time and/or location changes. To the best of our knowledge, this work is the first to use MIMO diversity to mitigate the impact of channel variability and provide a channel-resilient device identification over flat fading channels. Specifically, we show that MIMO can increase the device classification accuracy by up to about $50\%$ when model training and testing are done over the same channel and by up to about $70\%$ when training and testing are done over different fading channels.
Abstract:New capabilities in wireless network security have been enabled by deep learning, which leverages patterns in radio frequency (RF) data to identify and authenticate devices. Open-set detection is an area of deep learning that identifies samples captured from new devices during deployment that were not part of the training set. Past work in open-set detection has mostly been applied to independent and identically distributed data such as images. In contrast, RF signal data present a unique set of challenges as the data forms a time series with non-linear time dependencies among the samples. We introduce a novel open-set detection approach based on the patterns of the hidden state values within a Convolutional Neural Network (CNN) Long Short-Term Memory (LSTM) model. Our approach greatly improves the Area Under the Precision-Recall Curve on LoRa, Wireless-WiFi, and Wired-WiFi datasets, and hence, can be used successfully to monitor and control unauthorized network access of wireless devices.
Abstract:As the journey of 5G standardization is coming to an end, academia and industry have already begun to consider the sixth-generation (6G) wireless networks, with an aim to meet the service demands for the next decade. Deep learning-based RF fingerprinting (DL-RFFP) has recently been recognized as a potential solution for enabling key wireless network applications and services, such as spectrum policy enforcement and network access control. The state-of-the-art DL-RFFP frameworks suffer from a significant performance drop when tested with data drawn from a domain that is different from that used for training data. In this paper, we propose ADL-ID, an unsupervised domain adaption framework that is based on adversarial disentanglement representation to address the temporal domain adaptation for the RFFP task. Our framework has been evaluated on real LoRa and WiFi datasets and showed about 24% improvement in accuracy when compared to the baseline CNN network on short-term temporal adaptation. It also improves the classification accuracy by up to 9% on long-term temporal adaptation. Furthermore, we release a 5-day, 2.1TB, large-scale WiFi 802.11b dataset collected from 50 Pycom devices to support the research community efforts in developing and validating robust RFFP methods.
Abstract:Recent device fingerprinting approaches rely on deep learning to extract device-specific features solely from raw RF signals to identify, classify and authenticate wireless devices. One widely known issue lies in the inability of these approaches to maintain good performances when the training data and testing data are collected under varying deployment domains. For example, when the learning model is trained on data collected from one receiver but tested on data collected from a different receiver, the performance degrades substantially compared to when both training and testing data are collected using the same receiver. The same also happens when considering other varying domains, like channel condition and protocol configuration. In this paper, we begin by explaining, through testbed experiments, the challenges these fingerprinting techniques face when it comes to domain portability. We will then present some ideas on how to go about addressing these challenges so as to make deep learning-based device fingerprinting more resilient to domain variability.
Abstract:Deep-learning-based device fingerprinting has recently been recognized as a key enabler for automated network access authentication. Its robustness to impersonation attacks due to the inherent difficulty of replicating physical features is what distinguishes it from conventional cryptographic solutions. Although device fingerprinting has shown promising performances, its sensitivity to changes in the network operating environment still poses a major limitation. This paper presents an experimental framework that aims to study and overcome the sensitivity of LoRa-enabled device fingerprinting to such changes. We first begin by describing RF datasets we collected using our LoRa-enabled wireless device testbed. We then propose a new fingerprinting technique that exploits out-of-band distortion information caused by hardware impairments to increase the fingerprinting accuracy. Finally, we experimentally study and analyze the sensitivity of LoRa RF fingerprinting to various network setting changes. Our results show that fingerprinting does relatively well when the learning models are trained and tested under the same settings. However, when trained and tested under different settings, these models exhibit moderate sensitivity to channel condition changes and severe sensitivity to protocol configuration and receiver hardware changes when IQ data is used as input. However, with FFT data is used as input, they perform poorly under any change.
Abstract:Recent deep neural network-based device classification studies show that complex-valued neural networks (CVNNs) yield higher classification accuracy than real-valued neural networks (RVNNs). Although this improvement is (intuitively) attributed to the complex nature of the input RF data (i.e., IQ symbols), no prior work has taken a closer look into analyzing such a trend in the context of wireless device identification. Our study provides a deeper understanding of this trend using real LoRa and WiFi RF datasets. We perform a deep dive into understanding the impact of (i) the input representation/type and (ii) the architectural layer of the neural network. For the input representation, we considered the IQ as well as the polar coordinates both partially and fully. For the architectural layer, we considered a series of ablation experiments that eliminate parts of the CVNN components. Our results show that CVNNs consistently outperform RVNNs counterpart in the various scenarios mentioned above, indicating that CVNNs are able to make better use of the joint information provided via the in-phase (I) and quadrature (Q) components of the signal.
Abstract:Deep learning-based RF fingerprinting has recently been recognized as a potential solution for enabling newly emerging wireless network applications, such as spectrum access policy enforcement, automated network device authentication, and unauthorized network access monitoring and control. Real, comprehensive RF datasets are now needed more than ever to enable the study, assessment, and validation of newly developed RF fingerprinting approaches. In this paper, we present and release a large-scale RF fingerprinting dataset, collected from 25 different LoRa-enabled IoT transmitting devices using USRP B210 receivers. Our dataset consists of a large number of SigMF-compliant binary files representing the I/Q time-domain samples and their corresponding FFT-based files of LoRa transmissions. This dataset provides a comprehensive set of essential experimental scenarios, considering both indoor and outdoor environments and various network deployments and configurations, such as the distance between the transmitters and the receiver, the configuration of the considered LoRa modulation, the physical location of the conducted experiment, and the receiver hardware used for training and testing the neural network models.
Abstract:The accurate identification of wireless devices is critical for enabling automated network access monitoring and authenticated data communication in large-scale networks; e.g., IoT. RF fingerprinting has emerged as a solution for device identification by leveraging the transmitter unique manufacturing impairments. Although deep learning is proven efficient in classifying devices based on the hardware impairments fingerprints, DL models perform poorly due to channel variations. That is, although training and testing neural networks using data generated during the same period achieve reliable classification, testing them on data generated at different times degrades the accuracy substantially, an already well recognized problem within the community. To the best of our knowledge, we are the first to propose to leverage MIMO capabilities to mitigate the channel effect and provide a channel-resilient device classification. We show that for AWGN channels, combining multiple received signals improves the testing accuracy by up to $30\%$. We also show that for Rayleigh channels, blind channel estimation enabled by MIMO increases the testing accuracy by up to $40\%$ when the models are trained and tested over the same channel, and by up to $60\%$ when the models are tested on a channel that is different from that used for training.