Abstract:Currently, state-of-the-art RL methods excel in single-task settings, but they still struggle to generalize across multiple tasks due to catastrophic forgetting challenges, where previously learned tasks are forgotten as new tasks are introduced. This multi-task learning capability is significantly important for generalist agents, where adaptation features are highly required (e.g., autonomous robots). On the other hand, Spiking Neural Networks (SNNs) have emerged as alternative energy-efficient neural network algorithms due to their sparse spike-based operations. Toward this, we propose MTSpark, a novel methodology to enable multi-task RL using spiking networks. Specifically, MTSpark develops a Deep Spiking Q-Network (DSQN) with active dendrites and dueling structure by leveraging task-specific context signals. Specifically, each neuron computes task-dependent activations that dynamically modulate inputs, forming specialized sub-networks for each task. Moreover, this bioplausible network model also benefits from SNNs, enhancing energy efficiency and making the model suitable for hardware implementation. Experimental results show that, our MTSpark effectively learns multiple tasks with higher performance compared to the state-of-the-art. Specifically, MTSpark successfully achieves high score in three Atari games (i.e., Pong: -5.4, Breakout: 0.6, and Enduro: 371.2), reaching human-level performance (i.e., Pong: -3, Breakout: 31, and Enduro: 368), where state-of-the-art struggle to achieve. In addition, our MTSpark also shows better accuracy in image classification tasks than the state-of-the-art. These results highlight the potential of our MTSpark methodology to develop generalist agents that can learn multiple tasks by leveraging both RL and SNN concepts.
Abstract:Large Language Models (LLMs) represent a class of deep learning models adept at understanding natural language and generating coherent responses to various prompts or queries. These models far exceed the complexity of conventional neural networks, often encompassing dozens of neural network layers and containing billions to trillions of parameters. They are typically trained on vast datasets, utilizing architectures based on transformer blocks. Present-day LLMs are multi-functional, capable of performing a range of tasks from text generation and language translation to question answering, as well as code generation and analysis. An advanced subset of these models, known as Multimodal Large Language Models (MLLMs), extends LLM capabilities to process and interpret multiple data modalities, including images, audio, and video. This enhancement empowers MLLMs with capabilities like video editing, image comprehension, and captioning for visual content. This survey provides a comprehensive overview of the recent advancements in LLMs. We begin by tracing the evolution of LLMs and subsequently delve into the advent and nuances of MLLMs. We analyze emerging state-of-the-art MLLMs, exploring their technical features, strengths, and limitations. Additionally, we present a comparative analysis of these models and discuss their challenges, potential limitations, and prospects for future development.
Abstract:Predicting loan eligibility with high accuracy remains a significant challenge in the finance sector. Accurate predictions enable financial institutions to make informed decisions, mitigate risks, and effectively adapt services to meet customer needs. However, the complexity and the high-dimensional nature of financial data have always posed significant challenges to achieving this level of precision. To overcome these issues, we propose a novel approach that employs Quantum Machine Learning (QML) for Loan Eligibility Prediction using Quantum Neural Networks (LEP-QNN).Our innovative approach achieves an accuracy of 98% in predicting loan eligibility from a single, comprehensive dataset. This performance boost is attributed to the strategic implementation of a dropout mechanism within the quantum circuit, aimed at minimizing overfitting and thereby improving the model's predictive reliability. In addition, our exploration of various optimizers leads to identifying the most efficient setup for our LEP-QNN framework, optimizing its performance. We also rigorously evaluate the resilience of LEP-QNN under different quantum noise scenarios, ensuring its robustness and dependability for quantum computing environments. This research showcases the potential of QML in financial predictions and establishes a foundational guide for advancing QML technologies, marking a step towards developing advanced, quantum-driven financial decision-making tools.
Abstract:The integration of fully homomorphic encryption (FHE) in federated learning (FL) has led to significant advances in data privacy. However, during the aggregation phase, it often results in performance degradation of the aggregated model, hindering the development of robust representational generalization. In this work, we propose a novel multimodal quantum federated learning framework that utilizes quantum computing to counteract the performance drop resulting from FHE. For the first time in FL, our framework combines a multimodal quantum mixture of experts (MQMoE) model with FHE, incorporating multimodal datasets for enriched representation and task-specific learning. Our MQMoE framework enhances performance on multimodal datasets and combined genomics and brain MRI scans, especially for underrepresented categories. Our results also demonstrate that the quantum-enhanced approach mitigates the performance degradation associated with FHE and improves classification accuracy across diverse datasets, validating the potential of quantum interventions in enhancing privacy in FL.
Abstract:The proliferation of smartphones and other mobile devices provides a unique opportunity to make Advanced Driver Assistance Systems (ADAS) accessible to everyone in the form of an application empowered by low-cost Machine/Deep Learning (ML/DL) models to enhance road safety. For the critical feature of Collision Avoidance in Mobile ADAS, lightweight Deep Neural Networks (DNN) for object detection exist, but conventional pixel-wise depth/distance estimation DNNs are vastly more computationally expensive making them unsuitable for a real-time application on resource-constrained devices. In this paper, we present a distance estimation model, DECADE, that processes each detector output instead of constructing pixel-wise depth/disparity maps. In it, we propose a pose estimation DNN to estimate allocentric orientation of detections to supplement the distance estimation DNN in its prediction of distance using bounding box features. We demonstrate that these modules can be attached to any detector to extend object detection with fast distance estimation. Evaluation of the proposed modules with attachment to and fine-tuning on the outputs of the YOLO object detector on the KITTI 3D Object Detection dataset achieves state-of-the-art performance with 1.38 meters in Mean Absolute Error and 7.3% in Mean Relative Error in the distance range of 0-150 meters. Our extensive evaluation scheme not only evaluates class-wise performance, but also evaluates range-wise accuracy especially in the critical range of 0-70m.
Abstract:To adapt to real-world dynamics, intelligent systems need to assimilate new knowledge without catastrophic forgetting, where learning new tasks leads to a degradation in performance on old tasks. To address this, continual learning concept is proposed for enabling autonomous systems to acquire new knowledge and dynamically adapt to changing environments. Specifically, energy-efficient continual learning is needed to ensure the functionality of autonomous systems under tight compute and memory resource budgets (i.e., so-called autonomous embedded systems). Neuromorphic computing, with brain-inspired Spiking Neural Networks (SNNs), offers inherent advantages for enabling low-power/energy continual learning in autonomous embedded systems. In this paper, we comprehensively discuss the foundations and methods for enabling continual learning in neural networks, then analyze the state-of-the-art works considering SNNs. Afterward, comparative analyses of existing methods are conducted while considering crucial design factors, such as network complexity, memory, latency, and power/energy efficiency. We also explore the practical applications that can benefit from SNN-based continual learning and open challenges in real-world scenarios. In this manner, our survey provides valuable insights into the recent advancements of SNN-based continual learning for real-world application use-cases.
Abstract:Autonomous vehicles (AVs) rely heavily on LiDAR (Light Detection and Ranging) systems for accurate perception and navigation, providing high-resolution 3D environmental data that is crucial for object detection and classification. However, LiDAR systems are vulnerable to adversarial attacks, which pose significant challenges to the safety and robustness of AVs. This survey presents a thorough review of the current research landscape on physical adversarial attacks targeting LiDAR-based perception systems, covering both single-modality and multi-modality contexts. We categorize and analyze various attack types, including spoofing and physical adversarial object attacks, detailing their methodologies, impacts, and potential real-world implications. Through detailed case studies and analyses, we identify critical challenges and highlight gaps in existing attacks for LiDAR-based systems. Additionally, we propose future research directions to enhance the security and resilience of these systems, ultimately contributing to the safer deployment of autonomous vehicles.
Abstract:Although language model (LM) agents are demonstrating growing potential in many domains, their success in cybersecurity has been limited due to simplistic design and the lack of fundamental features for this domain. We present EnIGMA, an LM agent for autonomously solving Capture The Flag (CTF) challenges. EnIGMA introduces new Agent-Computer Interfaces (ACIs) to improve the success rate on CTF challenges. We establish the novel Interactive Agent Tool concept, which enables LM agents to run interactive command-line utilities essential for these challenges. Empirical analysis of EnIGMA on over 350 CTF challenges from three different benchmarks indicates that providing a robust set of new tools with demonstration of their usage helps the LM solve complex problems and achieves state-of-the-art results on the NYU CTF and Intercode-CTF benchmarks. Finally, we discuss insights on ACI design and agent behavior on cybersecurity tasks that highlight the need to adapt real-world tools for LM agents.
Abstract:Optimizing Deep Learning-based Simultaneous Localization and Mapping (DL-SLAM) algorithms is essential for efficient implementation on resource-constrained embedded platforms, enabling real-time on-board computation in autonomous mobile robots. This paper presents SPAQ-DL-SLAM, a framework that strategically applies Structured Pruning and Quantization (SPAQ) to the architecture of one of the state-ofthe-art DL-SLAM algorithms, DROID-SLAM, for resource and energy-efficiency. Specifically, we perform structured pruning with fine-tuning based on layer-wise sensitivity analysis followed by 8-bit post-training static quantization (PTQ) on the deep learning modules within DROID-SLAM. Our SPAQ-DROIDSLAM model, optimized version of DROID-SLAM model using our SPAQ-DL-SLAM framework with 20% structured pruning and 8-bit PTQ, achieves an 18.9% reduction in FLOPs and a 79.8% reduction in overall model size compared to the DROID-SLAM model. Our evaluations on the TUM-RGBD benchmark shows that SPAQ-DROID-SLAM model surpasses the DROID-SLAM model by an average of 10.5% on absolute trajectory error (ATE) metric. Additionally, our results on the ETH3D SLAM training benchmark demonstrate enhanced generalization capabilities of the SPAQ-DROID-SLAM model, seen by a higher Area Under the Curve (AUC) score and success in 2 additional data sequences compared to the DROIDSLAM model. Despite these improvements, the model exhibits performance variance on the distinct Vicon Room sequences from the EuRoC dataset, which are captured at high angular velocities. This varying performance at some distinct scenarios suggests that designing DL-SLAM algorithms taking operating environments and tasks in consideration can achieve optimal performance and resource efficiency for deployment in resource-constrained embedded platforms.
Abstract:The widespread deployment of products powered by machine learning models is raising concerns around data privacy and information security worldwide. To address this issue, Federated Learning was first proposed as a privacy-preserving alternative to conventional methods that allow multiple learning clients to share model knowledge without disclosing private data. A complementary approach known as Fully Homomorphic Encryption (FHE) is a quantum-safe cryptographic system that enables operations to be performed on encrypted weights. However, implementing mechanisms such as these in practice often comes with significant computational overhead and can expose potential security threats. Novel computing paradigms, such as analog, quantum, and specialized digital hardware, present opportunities for implementing privacy-preserving machine learning systems while enhancing security and mitigating performance loss. This work instantiates these ideas by applying the FHE scheme to a Federated Learning Neural Network architecture that integrates both classical and quantum layers.