Abstract:Understanding the internal mechanisms of large language models (LLMs) is integral to enhancing their reliability, interpretability, and inference processes. We present Constituent-Aware Pooling (CAP), a methodology designed to analyse how LLMs process compositional linguistic structures. Grounded in principles of compositionality, mechanistic interpretability, and information gain theory, CAP systematically intervenes in model activations through constituent-based pooling at various model levels. Our experiments on inverse definition modelling, hypernym and synonym prediction reveal critical insights into transformers' limitations in handling compositional abstractions. No specific layer integrates tokens into unified semantic representations based on their constituent parts. We observe fragmented information processing, which intensifies with model size, suggesting that larger models struggle more with these interventions and exhibit greater information dispersion. This fragmentation likely stems from transformers' training objectives and architectural design, preventing systematic and cohesive representations. Our findings highlight fundamental limitations in current transformer architectures regarding compositional semantics processing and model interpretability, underscoring the critical need for novel approaches in LLM design to address these challenges.
Abstract:Locating and editing knowledge in large language models (LLMs) is crucial for enhancing their accuracy, safety, and inference rationale. We introduce ``concept editing'', an innovative variation of knowledge editing that uncovers conceptualisation mechanisms within these models. Using the reverse dictionary task, inference tracing, and input abstraction, we analyse the Multi-Layer Perceptron (MLP), Multi-Head Attention (MHA), and hidden state components of transformer models. Our results reveal distinct patterns: MLP layers employ key-value retrieval mechanism and context-dependent processing, which are highly associated with relative input tokens. MHA layers demonstrate a distributed nature with significant higher-level activations, suggesting sophisticated semantic integration. Hidden states emphasise the importance of the last token and top layers in the inference process. We observe evidence of gradual information building and distributed representation. These observations elucidate how transformer models process semantic information, paving the way for targeted interventions and improved interpretability techniques. Our work highlights the complex, layered nature of semantic processing in LLMs and the challenges of isolating and modifying specific concepts within these models.
Abstract:Federated learning enables multiple clients to collaboratively contribute to the learning of a global model orchestrated by a central server. This learning scheme promotes clients' data privacy and requires reduced communication overheads. In an application like network traffic classification, this helps hide the network vulnerabilities and weakness points. However, federated learning is susceptible to backdoor attacks, in which adversaries inject manipulated model updates into the global model. These updates inject a salient functionality in the global model that can be launched with specific input patterns. Nonetheless, the vulnerability of network traffic classification models based on federated learning to these attacks remains unexplored. In this paper, we propose GABAttack, a novel genetic algorithm-based backdoor attack against federated learning for network traffic classification. GABAttack utilizes a genetic algorithm to optimize the values and locations of backdoor trigger patterns, ensuring a better fit with the input and the model. This input-tailored dynamic attack is promising for improved attack evasiveness while being effective. Extensive experiments conducted over real-world network datasets validate the success of the proposed GABAttack in various situations while maintaining almost invisible activity. This research serves as an alarming call for network security experts and practitioners to develop robust defense measures against such attacks.