Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yan Cui

ST-FlowNet: An Efficient Spiking Neural Network for Event-Based Optical Flow Estimation

Mar 13, 2025

Hongze Sun, Jun Wang, Wuque Cai, Duo Chen, Qianqian Liao, Jiayi He, Yan Cui, Dezhong Yao, Daqing Guo

Abstract:Spiking Neural Networks (SNNs) have emerged as a promising tool for event-based optical flow estimation tasks due to their ability to leverage spatio-temporal information and low-power capabilities. However, the performance of SNN models is often constrained, limiting their application in real-world scenarios. In this work, we address this gap by proposing a novel neural network architecture, ST-FlowNet, specifically tailored for optical flow estimation from event-based data. The ST-FlowNet architecture integrates ConvGRU modules to facilitate cross-modal feature augmentation and temporal alignment of the predicted optical flow, improving the network's ability to capture complex motion dynamics. Additionally, to overcome the challenges associated with training SNNs, we introduce a novel approach to derive SNN models from pre-trained artificial neural networks (ANNs) through ANN-to-SNN conversion or our proposed BISNN method. Notably, the BISNN method alleviates the complexities involved in biological parameter selection, further enhancing the robustness of SNNs in optical flow estimation tasks. Extensive evaluations on three benchmark event-based datasets demonstrate that the SNN-based ST-FlowNet model outperforms state-of-the-art methods, delivering superior performance in accurate optical flow estimation across a diverse range of dynamic visual scenes. Furthermore, the inherent energy efficiency of SNN models is highlighted, establishing a compelling advantage for their practical deployment. Overall, our work presents a novel framework for optical flow estimation using SNNs and event-based data, contributing to the advancement of neuromorphic vision applications.

* 12 pages, 5 figures, 5 tables; This work has been submitted for possible publication

Via

Access Paper or Ask Questions

Reliable Object Tracking by Multimodal Hybrid Feature Extraction and Transformer-Based Fusion

May 28, 2024

Hongze Sun, Rui Liu, Wuque Cai, Jun Wang, Yue Wang, Huajin Tang, Yan Cui, Dezhong Yao, Daqing Guo

Figure 1 for Reliable Object Tracking by Multimodal Hybrid Feature Extraction and Transformer-Based Fusion

Figure 2 for Reliable Object Tracking by Multimodal Hybrid Feature Extraction and Transformer-Based Fusion

Figure 3 for Reliable Object Tracking by Multimodal Hybrid Feature Extraction and Transformer-Based Fusion

Figure 4 for Reliable Object Tracking by Multimodal Hybrid Feature Extraction and Transformer-Based Fusion

Abstract:Visual object tracking, which is primarily based on visible light image sequences, encounters numerous challenges in complicated scenarios, such as low light conditions, high dynamic ranges, and background clutter. To address these challenges, incorporating the advantages of multiple visual modalities is a promising solution for achieving reliable object tracking. However, the existing approaches usually integrate multimodal inputs through adaptive local feature interactions, which cannot leverage the full potential of visual cues, thus resulting in insufficient feature modeling. In this study, we propose a novel multimodal hybrid tracker (MMHT) that utilizes frame-event-based data for reliable single object tracking. The MMHT model employs a hybrid backbone consisting of an artificial neural network (ANN) and a spiking neural network (SNN) to extract dominant features from different visual modalities and then uses a unified encoder to align the features across different domains. Moreover, we propose an enhanced transformer-based module to fuse multimodal features using attention mechanisms. With these methods, the MMHT model can effectively construct a multiscale and multidimensional visual feature space and achieve discriminative feature modeling. Extensive experiments demonstrate that the MMHT model exhibits competitive performance in comparison with that of other state-of-the-art methods. Overall, our results highlight the effectiveness of the MMHT model in terms of addressing the challenges faced in visual object tracking tasks.

* 16 pages, 7 figures, 9 tabes; This work has been submitted for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Via

Access Paper or Ask Questions

Better Explain Transformers by Illuminating Important Information

Jan 26, 2024

Linxin Song, Yan Cui, Ao Luo, Freddy Lecue, Irene Li

Figure 1 for Better Explain Transformers by Illuminating Important Information

Figure 2 for Better Explain Transformers by Illuminating Important Information

Figure 3 for Better Explain Transformers by Illuminating Important Information

Figure 4 for Better Explain Transformers by Illuminating Important Information

Abstract:Transformer-based models excel in various natural language processing (NLP) tasks, attracting countless efforts to explain their inner workings. Prior methods explain Transformers by focusing on the raw gradient and attention as token attribution scores, where non-relevant information is often considered during explanation computation, resulting in confusing results. In this work, we propose highlighting the important information and eliminating irrelevant information by a refined information flow on top of the layer-wise relevance propagation (LRP) method. Specifically, we consider identifying syntactic and positional heads as important attention heads and focus on the relevance obtained from these important heads. Experimental results demonstrate that irrelevant information does distort output attribution scores and then should be masked during explanation computation. Compared to eight baselines on both classification and question-answering datasets, our method consistently outperforms with over 3\% to 33\% improvement on explanation metrics, providing superior explanation performance. Our anonymous code repository is available at: https://github.com/LinxinS97/Mask-LRP

Via

Access Paper or Ask Questions

CEIR: Concept-based Explainable Image Representation Learning

Dec 17, 2023

Yan Cui, Shuhong Liu, Liuzhuozheng Li, Zhiyuan Yuan

Abstract:In modern machine learning, the trend of harnessing self-supervised learning to derive high-quality representations without label dependency has garnered significant attention. However, the absence of label information, coupled with the inherently high-dimensional nature, improves the difficulty for the interpretation of learned representations. Consequently, indirect evaluations become the popular metric for evaluating the quality of these features, leading to a biased validation of the learned representation rationale. To address these challenges, we introduce a novel approach termed Concept-based Explainable Image Representation (CEIR). Initially, using the Concept-based Model (CBM) incorporated with pretrained CLIP and concepts generated by GPT-4, we project input images into a concept vector space. Subsequently, a Variational Autoencoder (VAE) learns the latent representation from these projected concepts, which serves as the final image representation. Due to the capability of the representation to encapsulate high-level, semantically relevant concepts, the model allows for attributions to a human-comprehensible concept space. This not only enhances interpretability but also preserves the robustness essential for downstream tasks. For instance, our method exhibits state-of-the-art unsupervised clustering performance on benchmarks such as CIFAR10, CIFAR100, and STL10. Furthermore, capitalizing on the universality of human conceptual understanding, CEIR can seamlessly extract the related concept from open-world images without fine-tuning. This offers a fresh approach to automatic label generation and label manipulation.

* 8 pages

Via

Access Paper or Ask Questions

A Spatial-channel-temporal-fused Attention for Spiking Neural Networks

Sep 22, 2022

Wuque Cai, Hongze Sun, Rui Liu, Yan Cui, Jun Wang, Yang Xia, Dezhong Yao, Daqing Guo

Figure 1 for A Spatial-channel-temporal-fused Attention for Spiking Neural Networks

Figure 2 for A Spatial-channel-temporal-fused Attention for Spiking Neural Networks

Figure 3 for A Spatial-channel-temporal-fused Attention for Spiking Neural Networks

Figure 4 for A Spatial-channel-temporal-fused Attention for Spiking Neural Networks

Abstract:Spiking neural networks (SNNs) mimic brain computational strategies, and exhibit substantial capabilities in spatiotemporal information processing. As an essential factor for human perception, visual attention refers to the dynamic selection process of salient regions in biological vision systems. Although mechanisms of visual attention have achieved great success in computer vision, they are rarely introduced into SNNs. Inspired by experimental observations on predictive attentional remapping, we here propose a new spatial-channel-temporal-fused attention (SCTFA) module that can guide SNNs to efficiently capture underlying target regions by utilizing historically accumulated spatial-channel information. Through a systematic evaluation on three event stream datasets (DVS Gesture, SL-Animals-DVS and MNIST-DVS), we demonstrate that the SNN with the SCTFA module (SCTFA-SNN) not only significantly outperforms the baseline SNN (BL-SNN) and other two SNN models with degenerated attention modules, but also achieves competitive accuracy with existing state-of-the-art methods. Additionally, our detailed analysis shows that the proposed SCTFA-SNN model has strong robustness to noise and outstanding stability to incomplete data, while maintaining acceptable complexity and efficiency. Overall, these findings indicate that appropriately incorporating cognitive mechanisms of the brain may provide a promising approach to elevate the capability of SNNs.

* 12 pages, 8 figures, 5 tabes; This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Via

Access Paper or Ask Questions

A Synapse-Threshold Synergistic Learning Approach for Spiking Neural Networks

Jun 10, 2022

Hongze Sun, Wuque Cai, Baoxin Yang, Yan Cui, Yang Xia, Dezhong Yao, Daqing Guo

Figure 1 for A Synapse-Threshold Synergistic Learning Approach for Spiking Neural Networks

Figure 2 for A Synapse-Threshold Synergistic Learning Approach for Spiking Neural Networks

Figure 3 for A Synapse-Threshold Synergistic Learning Approach for Spiking Neural Networks

Figure 4 for A Synapse-Threshold Synergistic Learning Approach for Spiking Neural Networks

Abstract:Spiking neural networks (SNNs) have demonstrated excellent capabilities in various intelligent scenarios. Most existing methods for training SNNs are based on the concept of synaptic plasticity; however, learning in the realistic brain also utilizes intrinsic non-synaptic mechanisms of neurons. The spike threshold of biological neurons is a critical intrinsic neuronal feature that exhibits rich dynamics on a millisecond timescale and has been proposed as an underlying mechanism that facilitates neural information processing. In this study, we develop a novel synergistic learning approach that simultaneously trains synaptic weights and spike thresholds in SNNs. SNNs trained with synapse-threshold synergistic learning (STL-SNNs) achieve significantly higher accuracies on various static and neuromorphic datasets than SNNs trained with two single-learning models of the synaptic learning (SL) and the threshold learning (TL). During training, the synergistic learning approach optimizes neural thresholds, providing the network with stable signal transmission via appropriate firing rates. Further analysis indicates that STL-SNNs are robust to noisy data and exhibit low energy consumption for deep network structures. Additionally, the performance of STL-SNN can be further improved by introducing a generalized joint decision framework (JDF). Overall, our findings indicate that biologically plausible synergies between synaptic and intrinsic non-synaptic mechanisms may provide a promising approach for developing highly efficient SNN learning methods.

* 13 pages, 9 figures, submitted to the IEEE Transactions on Neural Networks and Learning Systems (TNNLS)

Via

Access Paper or Ask Questions

Monocular Depth Estimation with Sharp Boundary

Oct 12, 2021

Xin Yang, Qingling Chang, Xinlin Liu, Yan Cui

Figure 1 for Monocular Depth Estimation with Sharp Boundary

Figure 2 for Monocular Depth Estimation with Sharp Boundary

Figure 3 for Monocular Depth Estimation with Sharp Boundary

Figure 4 for Monocular Depth Estimation with Sharp Boundary

Abstract:Monocular depth estimation is the base task in computer vision. It has a tremendous development in the decade with the development of deep learning. But the boundary blur of the depth map is still a serious problem. Research finds the boundary blur problem is mainly caused by two factors, first, the low-level features containing boundary and structure information may loss in deeper networks during the convolution process., second, the model ignores the errors introduced by the boundary area due to the few portions of the boundary in the whole areas during the backpropagation. In order to mitigate the boundary blur problem, we focus on the above two impact factors. Firstly, we design a scene understanding module to learn the global information with low- and high-level features, and then to transform the global information to different scales with our proposed scale transform module according to the different phases in the decoder. Secondly, we propose a boundary-aware depth loss function to pay attention to the effects of the boundary's depth value. The extensive experiments show that our method can predict the depth maps with clearer boundaries, and the performance of the depth accuracy base on NYU-depth v2 and SUN RGB-D is competitive.

* 20 pages,9 figures

Via

Access Paper or Ask Questions