Abstract:This paper introduces a powerful encoder that transfers CLIP`s capabilities to event-based data, enhancing its utility and expanding its applicability across diverse domains. While large-scale datasets have significantly advanced image-based models, the scarcity of comprehensive event datasets has limited performance potential in event modality. To address this challenge, we adapt CLIP`s architecture to align event embeddings with image embeddings, supporting zero-shot learning and preserving text alignment while mitigating catastrophic forgetting. Our encoder achieves strong performance in object recognition, with competitive results in zero-shot and few-shot learning tasks. Notably, it generalizes effectively to events extracted from video data without requiring additional training, highlighting its versatility. Additionally, we integrate this encoder within a cross-modality framework that facilitates interaction across five modalities-Image, Event, Text, Sound, and Depth-expanding the possibilities for cross-modal applications. Overall, this work underscores the transformative potential of a robust event encoder, broadening the scope and utility of event-based data across various fields.
Abstract:Applications in the Internet of Things (IoT) utilize machine learning to analyze sensor-generated data. However, a major challenge lies in the lack of targeted intelligence in current sensing systems, leading to vast data generation and increased computational and communication costs. To address this challenge, we propose a novel sensing module to equip sensing frameworks with intelligent data transmission capabilities by integrating a highly efficient machine learning model placed near the sensor. This model provides prompt feedback for the sensing system to transmit only valuable data while discarding irrelevant information by regulating the frequency of data transmission. The near-sensor model is quantized and optimized for real-time sensor control. To enhance the framework's performance, the training process is customized and a "lazy" sensor deactivation strategy utilizing temporal information is introduced. The suggested method is orthogonal to other IoT frameworks and can be considered as a plugin for selective data transmission. The framework is implemented, encompassing both software and hardware components. The experiments demonstrate that the framework utilizing the suggested module achieves over 85% system efficiency in terms of energy consumption and storage, with negligible impact on performance. This methodology has the potential to significantly reduce data output from sensors, benefiting a wide range of IoT applications.