Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Theodoros Spyridopoulos

Gotham Dataset 2025: A Reproducible Large-Scale IoT Network Dataset for Intrusion Detection and Security Research

Feb 05, 2025

Othmane Belarbi, Theodoros Spyridopoulos, Eirini Anthi, Omer Rana, Pietro Carnelli, Aftab Khan

Abstract:In this paper, a dataset of IoT network traffic is presented. Our dataset was generated by utilising the Gotham testbed, an emulated large-scale Internet of Things (IoT) network designed to provide a realistic and heterogeneous environment for network security research. The testbed includes 78 emulated IoT devices operating on various protocols, including MQTT, CoAP, and RTSP. Network traffic was captured in Packet Capture (PCAP) format using tcpdump, and both benign and malicious traffic were recorded. Malicious traffic was generated through scripted attacks, covering a variety of attack types, such as Denial of Service (DoS), Telnet Brute Force, Network Scanning, CoAP Amplification, and various stages of Command and Control (C&C) communication. The data were subsequently processed in Python for feature extraction using the Tshark tool, and the resulting data was converted to Comma Separated Values (CSV) format and labelled. The data repository includes the raw network traffic in PCAP format and the processed labelled data in CSV format. Our dataset was collected in a distributed manner, where network traffic was captured separately for each IoT device at the interface between the IoT gateway and the device. Our dataset was collected in a distributed manner, where network traffic was separately captured for each IoT device at the interface between the IoT gateway and the device. With its diverse traffic patterns and attack scenarios, this dataset provides a valuable resource for developing Intrusion Detection Systems and security mechanisms tailored to complex, large-scale IoT environments. The dataset is publicly available at Zenodo.

* 16 pages, 7 figures, 4 tables. Submitted at the Data in Brief journal

Via

Access Paper or Ask Questions

Federated Deep Learning for Intrusion Detection in IoT Networks

Jun 07, 2023

Othmane Belarbi, Theodoros Spyridopoulos, Eirini Anthi, Ioannis Mavromatis, Pietro Carnelli, Aftab Khan

Abstract:The vast increase of IoT technologies and the ever-evolving attack vectors and threat actors have increased cyber-security risks dramatically. Novel attacks can compromise IoT devices to gain access to sensitive data or control them to deploy further malicious activities. The detection of novel attacks often relies upon AI solutions. A common approach to implementing AI-based IDS in distributed IoT systems is in a centralised manner. However, this approach may violate data privacy and secrecy. In addition, centralised data collection prohibits the scale-up of IDSs. Therefore, intrusion detection solutions in IoT ecosystems need to move towards a decentralised direction. FL has attracted significant interest in recent years due to its ability to perform collaborative learning while preserving data confidentiality and locality. Nevertheless, most FL-based IDS for IoT systems are designed under unrealistic data distribution conditions. To that end, we design an experiment representative of the real world and evaluate the performance of two FL IDS implementations, one based on DNNs and another on our previous work on DBNs. For our experiments, we rely on TON-IoT, a realistic IoT network traffic dataset, associating each IP address with a single FL client. Additionally, we explore pre-training and investigate various aggregation methods to mitigate the impact of data heterogeneity. Lastly, we benchmark our approach against a centralised solution. The comparison shows that the heterogeneous nature of the data has a considerable negative impact on the model performance when trained in a distributed manner. However, in the case of a pre-trained initial global FL model, we demonstrate a performance improvement of over 20% (F1-score) when compared against a randomly initiated global model.

* 14 pages, 5 figues, 3 tables

Via

Access Paper or Ask Questions

Hierarchical and Decentralised Federated Learning

Apr 28, 2023

Omer Rana, Theodoros Spyridopoulos, Nathaniel Hudson, Matt Baughman, Kyle Chard, Ian Foster, Aftab Khan

Abstract:Federated learning has shown enormous promise as a way of training ML models in distributed environments while reducing communication costs and protecting data privacy. However, the rise of complex cyber-physical systems, such as the Internet-of-Things, presents new challenges that are not met with traditional FL methods. Hierarchical Federated Learning extends the traditional FL process to enable more efficient model aggregation based on application needs or characteristics of the deployment environment (e.g., resource capabilities and/or network connectivity). It illustrates the benefits of balancing processing across the cloud-edge continuum. Hierarchical Federated Learning is likely to be a key enabler for a wide range of applications, such as smart farming and smart energy management, as it can improve performance and reduce costs, whilst also enabling FL workflows to be deployed in environments that are not well-suited to traditional FL. Model aggregation algorithms, software frameworks, and infrastructures will need to be designed and implemented to make such solutions accessible to researchers and engineers across a growing set of domains. H-FL also introduces a number of new challenges. For instance, there are implicit infrastructural challenges. There is also a trade-off between having generalised models and personalised models. If there exist geographical patterns for data (e.g., soil conditions in a smart farm likely are related to the geography of the region itself), then it is crucial that models used locally can consider their own locality in addition to a globally-learned model. H-FL will be crucial to future FL solutions as it can aggregate and distribute models at multiple levels to optimally serve the trade-off between locality dependence and global anomaly robustness.

* 11 pages, 6 figures, 25 references

Via

Access Paper or Ask Questions

LE3D: A Lightweight Ensemble Framework of Data Drift Detectors for Resource-Constrained Devices

Nov 18, 2022

Ioannis Mavromatis, Adrian Sanchez-Mompo, Francesco Raimondo, James Pope, Marcello Bullo, Ingram Weeks, Vijay Kumar, Pietro Carnelli, George Oikonomou, Theodoros Spyridopoulos(+1 more)

Figure 1 for LE3D: A Lightweight Ensemble Framework of Data Drift Detectors for Resource-Constrained Devices

Figure 2 for LE3D: A Lightweight Ensemble Framework of Data Drift Detectors for Resource-Constrained Devices

Figure 3 for LE3D: A Lightweight Ensemble Framework of Data Drift Detectors for Resource-Constrained Devices

Figure 4 for LE3D: A Lightweight Ensemble Framework of Data Drift Detectors for Resource-Constrained Devices

Abstract:Data integrity becomes paramount as the number of Internet of Things (IoT) sensor deployments increases. Sensor data can be altered by benign causes or malicious actions. Mechanisms that detect drifts and irregularities can prevent disruptions and data bias in the state of an IoT application. This paper presents LE3D, an ensemble framework of data drift estimators capable of detecting abnormal sensor behaviours. Working collaboratively with surrounding IoT devices, the type of drift (natural/abnormal) can also be identified and reported to the end-user. The proposed framework is a lightweight and unsupervised implementation able to run on resource-constrained IoT devices. Our framework is also generalisable, adapting to new sensor streams and environments with minimal online reconfiguration. We compare our method against state-of-the-art ensemble data drift detection frameworks, evaluating both the real-world detection accuracy as well as the resource utilisation of the implementation. Experimenting with real-world data and emulated drifts, we show the effectiveness of our method, which achieves up to 97% of detection accuracy while requiring minimal resources to run.

* IEEE CCNC 2023, Las Vegas, USA

Via

Access Paper or Ask Questions

An Intrusion Detection System based on Deep Belief Networks

Jul 05, 2022

Othmane Belarbi, Aftab Khan, Pietro Carnelli, Theodoros Spyridopoulos

Figure 1 for An Intrusion Detection System based on Deep Belief Networks

Figure 2 for An Intrusion Detection System based on Deep Belief Networks

Figure 3 for An Intrusion Detection System based on Deep Belief Networks

Figure 4 for An Intrusion Detection System based on Deep Belief Networks

Abstract:The rapid growth of connected devices has led to the proliferation of novel cyber-security threats known as zero-day attacks. Traditional behaviour-based IDS rely on DNN to detect these attacks. The quality of the dataset used to train the DNN plays a critical role in the detection performance, with underrepresented samples causing poor performances. In this paper, we develop and evaluate the performance of DBN on detecting cyber-attacks within a network of connected devices. The CICIDS2017 dataset was used to train and evaluate the performance of our proposed DBN approach. Several class balancing techniques were applied and evaluated. Lastly, we compare our approach against a conventional MLP model and the existing state-of-the-art. Our proposed DBN approach shows competitive and promising results, with significant performance improvement on the detection of attacks underrepresented in the training dataset.

* 15 pages, 4 figures, 4 tables. To be presented at the 4th International Conference on Science of Cyber Security (SciSec) in August 2022

Via

Access Paper or Ask Questions