Abstract:Only increasing accuracy without considering uncertainty may negatively impact Deep Neural Network (DNN) decision-making and decrease its reliability. This paper proposes five combined preprocessing and post-processing methods for time-series binary classification problems that simultaneously increase the accuracy and reliability of DNN outputs applied in a 5G UAV security dataset. These techniques use DNN outputs as input parameters and process them in different ways. Two methods use a well-known Machine Learning (ML) algorithm as a complement, and the other three use only confidence values that the DNN estimates. We compare seven different metrics, such as the Expected Calibration Error (ECE), Maximum Calibration Error (MCE), Mean Confidence (MC), Mean Accuracy (MA), Normalized Negative Log Likelihood (NLL), Brier Score Loss (BSL), and Reliability Score (RS) and the tradeoffs between them to evaluate the proposed hybrid algorithms. First, we show that the eXtreme Gradient Boosting (XGB) classifier might not be reliable for binary classification under the conditions this work presents. Second, we demonstrate that at least one of the potential methods can achieve better results than the classification in the DNN softmax layer. Finally, we show that the prospective methods may improve accuracy and reliability with better uncertainty calibration based on the assumption that the RS determines the difference between MC and MA metrics, and this difference should be zero to increase reliability. For example, Method 3 presents the best RS of 0.65 even when compared to the XGB classifier, which achieves RS of 7.22.
Abstract:Synthetic datasets are beneficial for machine learning researchers due to the possibility of experimenting with new strategies and algorithms in the training and testing phases. These datasets can easily include more scenarios that might be costly to research with real data or can complement and, in some cases, replace real data measurements, depending on the quality of the synthetic data. They can also solve the unbalanced data problem, avoid overfitting, and can be used in training while testing can be done with real data. In this paper, we present, to the best of our knowledge, the first synthetic dataset for Unmanned Aerial Vehicle (UAV) attacks in 5G and beyond networks based on the following key observable network parameters that indicate power levels: the Received Signal Strength Indicator (RSSI) and the Signal to Interference-plus-Noise Ratio (SINR). The main objective of this data is to enable deep network development for UAV communication security. Especially, for algorithm development or the analysis of time-series data applied to UAV attack recognition. Our proposed dataset provides insights into network functionality when static or moving UAV attackers target authenticated UAVs in an urban environment. The dataset also considers the presence and absence of authenticated terrestrial users in the network, which may decrease the deep networks ability to identify attacks. Furthermore, the data provides deeper comprehension of the metrics available in the 5G physical and MAC layers for machine learning and statistics research. The dataset will available at link archive-beta.ics.uci.edu
Abstract:When users exchange data with Unmanned Aerial vehicles - (UAVs) over air-to-ground (A2G) wireless communication networks, they expose the link to attacks that could increase packet loss and might disrupt connectivity. For example, in emergency deliveries, losing control information (i.e data related to the UAV control communication) might result in accidents that cause UAV destruction and damage to buildings or other elements in a city. To prevent these problems, these issues must be addressed in 5G and 6G scenarios. This research offers a deep learning (DL) approach for detecting attacks in UAVs equipped with orthogonal frequency division multiplexing (OFDM) receivers on Clustered Delay Line (CDL) channels in highly complex scenarios involving authenticated terrestrial users, as well as attackers in unknown locations. We use the two observable parameters available in 5G UAV connections: the Received Signal Strength Indicator (RSSI) and the Signal to Interference plus Noise Ratio (SINR). The prospective algorithm is generalizable regarding attack identification, which does not occur during training. Further, it can identify all the attackers in the environment with 20 terrestrial users. A deeper investigation into the timing requirements for recognizing attacks show that after training, the minimum time necessary after the attack begins is 100 ms, and the minimum attack power is 2 dBm, which is the same power that the authenticated UAV uses. Our algorithm also detects moving attackers from a distance of 500 m.
Abstract:Unmanned aerial vehicle (UAV) systems are vulnerable to jamming from self-interested users who utilize radio devices for their benefits during UAV transmissions. The vulnerability occurs due to the open nature of air-to-ground (A2G) wireless communication networks, which may enable network-wide attacks. This paper presents two strategies to identify Jammers in UAV networks. The first strategy is based on time series approaches for anomaly detection where the signal available in resource blocks are decomposed statistically to find trend, seasonality, and residues, while the second is based on newly designed deep networks. The joined technique is suitable for UAVs because the statistical model does not require heavy computation processing but is limited in generalizing possible attack's identification. On the other hand, the deep network can classify attacks accurately but requires more resources. The simulation considers the location and power of the jamming attacks and the UAV position related to the base station. The statistical method technique made it feasible to identify 84.38 % of attacks when the attacker was at 30 m from the UAV. Furthermore, the Deep network's accuracy was approximately 99.99 % for jamming powers greater than two and jammer distances less than 200 meters.
Abstract:Increasing the cardinality of categorical variables might decrease the overall performance of ML algorithms. This paper presents a novel computational preprocessing method to convert categorical to numerical variables for machine learning (ML) algorithms. In this method, We select and convert three categorical features to numerical features. First, we choose the threshold parameter based on the distribution of categories in variables. Then, we use conditional probabilities to convert each categorical variable into two new numerical variables, resulting in six new numerical variables in total. After that, we feed these six numerical variables to the Principal Component Analysis (PCA) algorithm. Next, we select the whole or partial numbers of Principal Components (PCs). Finally, by applying binary classification with ten different classifiers, We measured the performance of the new encoder and compared it with the other 17 well-known category encoders. The proposed technique achieved the highest performance related to accuracy and Area under the curve (AUC) on high cardinality categorical variables using the well-known cybersecurity NSLKDD dataset. Also, we defined harmonic average metrics to find the best trade-off between train and test performance and prevent underfitting and overfitting. Ultimately, the number of newly created numerical variables is minimal. Consequently, this data reduction improves computational processing time which might reduce processing data in 5G future telecommunication networks.