https://github.com/yfguo91/NeuroCLIP.git.
Recently, the neuromorphic vision sensor has received more and more interest. However, the neuromorphic data consists of asynchronous event spikes, which is not natural and difficult to construct a benchmark, thus limiting the neuromorphic data understanding for "unseen" objects by deep learning. Zero-shot and few-shot learning via Contrastive Vision-Language Pre-training (CLIP) have shown inspirational performance in 2D frame image recognition. To handle "unseen" recognition for the neuromorphic data, in this paper, we propose NeuroCLIP, which transfers the CLIP's 2D pre-trained knowledge to event spikes. To improve the few-shot performance, we also provide an inter-timestep adapter based on a spiking neural network. Our code is open-sourced at