Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

José Camacho

Quality In / Quality Out: Assessing Data quality in an Anomaly Detection Benchmark

May 31, 2023

José Camacho, Katarzyna Wasielewska, Marta Fuentes-García, Rafael Rodríguez-Gómez

Figure 1 for Quality In / Quality Out: Assessing Data quality in an Anomaly Detection Benchmark

Figure 2 for Quality In / Quality Out: Assessing Data quality in an Anomaly Detection Benchmark

Figure 3 for Quality In / Quality Out: Assessing Data quality in an Anomaly Detection Benchmark

Figure 4 for Quality In / Quality Out: Assessing Data quality in an Anomaly Detection Benchmark

Abstract:Autonomous or self-driving networks are expected to provide a solution to the myriad of extremely demanding new applications in the Future Internet. The key to handle complexity is to perform tasks like network optimization and failure recovery with minimal human supervision. For this purpose, the community relies on the development of new Machine Learning (ML) models and techniques. However, ML can only be as good as the data it is fitted with. Datasets provided to the community as benchmarks for research purposes, which have a relevant impact in research findings and directions, are often assumed to be of good quality by default. In this paper, we show that relatively minor modifications on the same benchmark dataset (UGR'16, a flow-based real-traffic dataset for anomaly detection) cause significantly more impact on model performance than the specific ML technique considered. To understand this finding, we contribute a methodology to investigate the root causes for those differences, and to assess the quality of the data labelling. Our findings illustrate the need to devote more attention into (automatic) data quality assessment and optimization techniques in the context of autonomous networks.

Via

Access Paper or Ask Questions

MSNM-S: An Applied Network Monitoring Tool for Anomaly Detection in Complex Networks and Systems

Jul 31, 2019

Roberto Magán-Carrión, José Camacho, Ángel Ruíz-Zafra

Figure 1 for MSNM-S: An Applied Network Monitoring Tool for Anomaly Detection in Complex Networks and Systems

Figure 2 for MSNM-S: An Applied Network Monitoring Tool for Anomaly Detection in Complex Networks and Systems

Figure 3 for MSNM-S: An Applied Network Monitoring Tool for Anomaly Detection in Complex Networks and Systems

Figure 4 for MSNM-S: An Applied Network Monitoring Tool for Anomaly Detection in Complex Networks and Systems

Abstract:Technology evolves quickly. Low cost and ready-to-connect devices are designed to provide new services and applications for a better people's daily life. Smart grids or smart healthcare systems are some examples of such applications all of them in the context of smart cities. In this all-connectivity scenario, some security issues arise since the larger is the number of connected devices the bigger is the surface attack dimension. This way, new solutions to monitor and detect security events are needed addressing new challenges coming from this scenario that are, among others, the number of devices to monitor, the huge amount of data to manage and the real time requirement to provide a quick security event detection and, consequently, quick attack reaction. In this work, the MSNM-Sensor is introduced, a practical and ready-to-use tool to monitor and detect security events able to manage this kind of environments. Although it is in its early development stages, experimental results based on the detection of well known attacks in hierarchical network systems proof its suitability to be applied in more complex scenarios like the ones found in smart cities or IoT ecosystems.

Via

Access Paper or Ask Questions

Networkmetrics unraveled: MBDA in Action

Jul 05, 2019

José Camacho, Rasmus Bro, David Kotz

Figure 1 for Networkmetrics unraveled: MBDA in Action

Figure 2 for Networkmetrics unraveled: MBDA in Action

Figure 3 for Networkmetrics unraveled: MBDA in Action

Figure 4 for Networkmetrics unraveled: MBDA in Action

Abstract:We propose networkmetrics, a new data-driven approach for monitoring, troubleshooting and understanding communication networks using multivariate analysis. Networkmetric models are powerful machine-learning tools to interpret and interact with data collected from a network. In this paper, we illustrate the application of Multivariate Big Data Analysis (MBDA), a recently proposed networkmetric method with application to Big Data sets. We use MBDA for the detection and troubleshooting of network problems in a campus-wide Wi-Fi network. Data includes a seven-year trace (from 2012 to 2018) of the network's most recent activity, with approximately 3,000 distinct access points, 40,000 authenticated users, and 600,000 distinct Wi-Fi stations. This is the longest and largest Wi-Fi trace known to date. To analyze this data, we propose learning and visualization procedures that extend MBDA. These procedures result in a methodology that allows network analysts to identify problems and diagnose and troubleshoot them, optimizing the network performance. In the paper, we go through the entire workflow of the approach, illustrating its application in detail and discussing processing times for parallel hardware.

Via

Access Paper or Ask Questions

Cross-product Penalized Component Analysis (XCAN)

Jun 28, 2019

José Camacho, Evrim Acar, Morten A. Rasmussen, Rasmus Bro

Figure 1 for Cross-product Penalized Component Analysis (XCAN)

Figure 2 for Cross-product Penalized Component Analysis (XCAN)

Figure 3 for Cross-product Penalized Component Analysis (XCAN)

Figure 4 for Cross-product Penalized Component Analysis (XCAN)

Abstract:Matrix factorization methods are extensively employed to understand complex data. In this paper, we introduce the cross-product penalized component analysis (XCAN), a sparse matrix factorization based on the optimization of a loss function that allows a trade-off between variance maximization and structural preservation. The approach is based on previous developments, notably (i) the Sparse Principal Component Analysis (SPCA) framework based on the LASSO, (ii) extensions of SPCA to constrain both modes of the factorization, like co-clustering or the Penalized Matrix Decomposition (PMD), and (iii) the Group-wise Principal Component Analysis (GPCA) method. The result is a flexible modeling approach that can be used for data exploration in a large variety of problems. We demonstrate its use with applications from different disciplines.

Via

Access Paper or Ask Questions

Multivariate Big Data Analysis for Intrusion Detection: 5 steps from the haystack to the needle

Jun 27, 2019

José Camacho, José Manuel García-Giménez, Noemí Marta Fuentes-García, Gabriel Maciá-Fernández

Figure 1 for Multivariate Big Data Analysis for Intrusion Detection: 5 steps from the haystack to the needle

Figure 2 for Multivariate Big Data Analysis for Intrusion Detection: 5 steps from the haystack to the needle

Figure 3 for Multivariate Big Data Analysis for Intrusion Detection: 5 steps from the haystack to the needle

Figure 4 for Multivariate Big Data Analysis for Intrusion Detection: 5 steps from the haystack to the needle

Abstract:The research literature on cybersecurity incident detection & response is very rich in automatic detection methodologies, in particular those based on the anomaly detection paradigm. However, very little attention has been devoted to the diagnosis ability of the methods, aimed to provide useful information on the causes of a given detected anomaly. This information is of utmost importance for the security team to reduce the time from detection to response. In this paper, we present Multivariate Big Data Analysis (MBDA), a complete intrusion detection approach based on 5 steps to effectively handle massive amounts of disparate data sources. The approach has been designed to deal with the main characteristics of Big Data, that is, the high volume, velocity and variety. The core of the approach is the Multivariate Statistical Network Monitoring (MSNM) technique proposed in a recent paper. Unlike in state of the art machine learning methodologies applied to the intrusion detection problem, when an anomaly is identified in MBDA the output of the system includes the detail of the logs of raw information associated to this anomaly, so that the security team can use this information to elucidate its root causes. MBDA is based in two open software packages available in Github: the MEDA Toolbox and the FCParser. We illustrate our approach with two case studies. The first one demonstrates the application of MBDA to semistructured sources of information, using the data from the VAST 2012 mini challenge 2. This complete case study is supplied in a virtual machine available for download. In the second case study we show the Big Data capabilities of the approach in data collected from a real network with labeled attacks.

Via

Access Paper or Ask Questions