Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Fernando Berzal

Differential Privacy in Machine Learning: From Symbolic AI to LLMs

Jun 13, 2025

Francisco Aguilera-Martínez, Fernando Berzal

Abstract:Machine learning models should not reveal particular information that is not otherwise accessible. Differential privacy provides a formal framework to mitigate privacy risks by ensuring that the inclusion or exclusion of any single data point does not significantly alter the output of an algorithm, thus limiting the exposure of private information. This survey paper explores the foundational definitions of differential privacy, reviews its original formulations and tracing its evolution through key research contributions. It then provides an in-depth examination of how DP has been integrated into machine learning models, analyzing existing proposals and methods to preserve privacy when training ML models. Finally, it describes how DP-based ML techniques can be evaluated in practice. %Finally, it discusses the broader implications of DP, highlighting its potential for public benefit, its real-world applications, and the challenges it faces, including vulnerabilities to adversarial attacks. By offering a comprehensive overview of differential privacy in machine learning, this work aims to contribute to the ongoing development of secure and responsible AI systems.

* arXiv admin note: text overlap with arXiv:2303.00654 by other authors

Via

Access Paper or Ask Questions

LLM Security: Vulnerabilities, Attacks, Defenses, and Countermeasures

May 02, 2025

Francisco Aguilera-Martínez, Fernando Berzal

Abstract:As large language models (LLMs) continue to evolve, it is critical to assess the security threats and vulnerabilities that may arise both during their training phase and after models have been deployed. This survey seeks to define and categorize the various attacks targeting LLMs, distinguishing between those that occur during the training phase and those that affect already trained models. A thorough analysis of these attacks is presented, alongside an exploration of defense mechanisms designed to mitigate such threats. Defenses are classified into two primary categories: prevention-based and detection-based defenses. Furthermore, our survey summarizes possible attacks and their corresponding defense strategies. It also provides an evaluation of the effectiveness of the known defense mechanisms for the different security threats. Our survey aims to offer a structured framework for securing LLMs, while also identifying areas that require further research to improve and strengthen defenses against emerging security challenges.

Via

Access Paper or Ask Questions

Differential Privacy Regularization: Protecting Training Data Through Loss Function Regularization

Sep 25, 2024

Francisco Aguilera-Martínez, Fernando Berzal

Abstract:Training machine learning models based on neural networks requires large datasets, which may contain sensitive information. The models, however, should not expose private information from these datasets. Differentially private SGD [DP-SGD] requires the modification of the standard stochastic gradient descent [SGD] algorithm for training new models. In this short paper, a novel regularization strategy is proposed to achieve the same goal in a more efficient manner.

Via

Access Paper or Ask Questions

Enhancing Community Detection in Networks: A Comparative Analysis of Local Metrics and Hierarchical Algorithms

Aug 17, 2024

Julio-Omar Palacio-Niño, Fernando Berzal

Abstract:The analysis and detection of communities in network structures are becoming increasingly relevant for understanding social behavior. One of the principal challenges in this field is the complexity of existing algorithms. The Girvan-Newman algorithm, which uses the betweenness metric as a measure of node similarity, is one of the most representative algorithms in this area. This study employs the same method to evaluate the relevance of using local similarity metrics for community detection. A series of local metrics were tested on a set of networks constructed using the Girvan-Newman basic algorithm. The efficacy of these metrics was evaluated by applying the base algorithm to several real networks with varying community sizes, using modularity and NMI. The results indicate that approaches based on local similarity metrics have significant potential for community detection.

Via

Access Paper or Ask Questions

On the use of local structural properties for improving the efficiency of hierarchical community detection methods

Sep 15, 2020

Julio-Omar Palacio-Niño, Fernando Berzal

Figure 1 for On the use of local structural properties for improving the efficiency of hierarchical community detection methods

Figure 2 for On the use of local structural properties for improving the efficiency of hierarchical community detection methods

Figure 3 for On the use of local structural properties for improving the efficiency of hierarchical community detection methods

Figure 4 for On the use of local structural properties for improving the efficiency of hierarchical community detection methods

Abstract:Community detection is a fundamental problem in the analysis of complex networks. It is the analogue of clustering in network data mining. Within community detection methods, hierarchical algorithms are popular. However, their iterative nature and the need to recompute the structural properties used to split the network (i.e. edge betweenness in Girvan and Newman's algorithm), make them unsuitable for large network data sets. In this paper, we study how local structural network properties can be used as proxies to improve the efficiency of hierarchical community detection while, at the same time, achieving competitive results in terms of modularity. In particular, we study the potential use of the structural properties commonly used to perform local link prediction, a supervised learning problem where community structure is relevant, as nodes are prone to establish new links with other nodes within their communities. In addition, we check the performance impact of network pruning heuristics as an ancillary tactic to make hierarchical community detection more efficient

Via

Access Paper or Ask Questions

Evaluation Metrics for Unsupervised Learning Algorithms

May 23, 2019

Julio-Omar Palacio-Niño, Fernando Berzal

Figure 1 for Evaluation Metrics for Unsupervised Learning Algorithms

Figure 2 for Evaluation Metrics for Unsupervised Learning Algorithms

Figure 3 for Evaluation Metrics for Unsupervised Learning Algorithms

Figure 4 for Evaluation Metrics for Unsupervised Learning Algorithms

Abstract:Determining the quality of the results obtained by clustering techniques is a key issue in unsupervised machine learning. Many authors have discussed the desirable features of good clustering algorithms. However, Jon Kleinberg established an impossibility theorem for clustering. As a consequence, a wealth of studies have proposed techniques to evaluate the quality of clustering results depending on the characteristics of the clustering problem and the algorithmic technique employed to cluster data.

* Technical Report

Via

Access Paper or Ask Questions

The NOESIS Network-Oriented Exploration, Simulation, and Induction System

Jun 23, 2017

Víctor Martínez, Fernando Berzal, Juan-Carlos Cubero

Figure 1 for The NOESIS Network-Oriented Exploration, Simulation, and Induction System

Figure 2 for The NOESIS Network-Oriented Exploration, Simulation, and Induction System

Figure 3 for The NOESIS Network-Oriented Exploration, Simulation, and Induction System

Figure 4 for The NOESIS Network-Oriented Exploration, Simulation, and Induction System

Abstract:Network data mining has become an important area of study due to the large number of problems it can be applied to. This paper presents NOESIS, an open source framework for network data mining that provides a large collection of network analysis techniques, including the analysis of network structural properties, community detection methods, link scoring, and link prediction, as well as network visualization algorithms. It also features a complete stand-alone graphical user interface that facilitates the use of all these techniques. The NOESIS framework has been designed using solid object-oriented design principles and structured parallel programming. As a lightweight library with minimal external dependencies and a permissive software license, NOESIS can be incorporated into other software projects. Released under a BSD license, it is available from http://noesis.ikor.org.

Via

Access Paper or Ask Questions

A Model-Driven Probabilistic Parser Generator

May 14, 2012

Luis Quesada, Fernando Berzal, Francisco J. Cortijo

Figure 1 for A Model-Driven Probabilistic Parser Generator

Figure 2 for A Model-Driven Probabilistic Parser Generator

Figure 3 for A Model-Driven Probabilistic Parser Generator

Figure 4 for A Model-Driven Probabilistic Parser Generator

Abstract:Existing probabilistic scanners and parsers impose hard constraints on the way lexical and syntactic ambiguities can be resolved. Furthermore, traditional grammar-based parsing tools are limited in the mechanisms they allow for taking context into account. In this paper, we propose a model-driven tool that allows for statistical language models with arbitrary probability estimators. Our work on model-driven probabilistic parsing is built on top of ModelCC, a model-based parser generator, and enables the probabilistic interpretation and resolution of anaphoric, cataphoric, and recursive references in the disambiguation of abstract syntax graphs. In order to prove the expression power of ModelCC, we describe the design of a general-purpose natural language parser.

Via

Access Paper or Ask Questions

A Lexical Analysis Tool with Ambiguity Support

Feb 29, 2012

Luis Quesada, Fernando Berzal, Francisco J. Cortijo

Figure 1 for A Lexical Analysis Tool with Ambiguity Support

Figure 2 for A Lexical Analysis Tool with Ambiguity Support

Figure 3 for A Lexical Analysis Tool with Ambiguity Support

Figure 4 for A Lexical Analysis Tool with Ambiguity Support

Abstract:Lexical ambiguities naturally arise in languages. We present Lamb, a lexical analyzer that produces a lexical analysis graph describing all the possible sequences of tokens that can be found within the input string. Parsers can process such lexical analysis graphs and discard any sequence of tokens that does not produce a valid syntactic sentence, therefore performing, together with Lamb, a context-sensitive lexical analysis in lexically-ambiguous language specifications.

Via

Access Paper or Ask Questions

A Constraint-Satisfaction Parser for Context-Free Grammars

Feb 02, 2012

Luis Quesada, Fernando Berzal, Francisco J. Cortijo

Figure 1 for A Constraint-Satisfaction Parser for Context-Free Grammars

Figure 2 for A Constraint-Satisfaction Parser for Context-Free Grammars

Figure 3 for A Constraint-Satisfaction Parser for Context-Free Grammars

Figure 4 for A Constraint-Satisfaction Parser for Context-Free Grammars

Abstract:Traditional language processing tools constrain language designers to specific kinds of grammars. In contrast, model-based language specification decouples language design from language processing. As a consequence, model-based language specification tools need general parsers able to parse unrestricted context-free grammars. As languages specified following this approach may be ambiguous, parsers must deal with ambiguities. Model-based language specification also allows the definition of associativity, precedence, and custom constraints. Therefore parsers generated by model-driven language specification tools need to enforce constraints. In this paper, we propose Fence, an efficient bottom-up chart parser with lexical and syntactic ambiguity support that allows the specification of constraints and, therefore, enables the use of model-based language specification in practice.

Via

Access Paper or Ask Questions