Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Junjie Jiang

Nonlinear Motion-Guided and Spatio-Temporal Aware Network for Unsupervised Event-Based Optical Flow

May 08, 2025

Zuntao Liu, Hao Zhuang, Junjie Jiang, Yuhang Song, Zheng Fang

Abstract:Event cameras have the potential to capture continuous motion information over time and space, making them well-suited for optical flow estimation. However, most existing learning-based methods for event-based optical flow adopt frame-based techniques, ignoring the spatio-temporal characteristics of events. Additionally, these methods assume linear motion between consecutive events within the loss time window, which increases optical flow errors in long-time sequences. In this work, we observe that rich spatio-temporal information and accurate nonlinear motion between events are crucial for event-based optical flow estimation. Therefore, we propose E-NMSTFlow, a novel unsupervised event-based optical flow network focusing on long-time sequences. We propose a Spatio-Temporal Motion Feature Aware (STMFA) module and an Adaptive Motion Feature Enhancement (AMFE) module, both of which utilize rich spatio-temporal information to learn spatio-temporal data associations. Meanwhile, we propose a nonlinear motion compensation loss that utilizes the accurate nonlinear motion between events to improve the unsupervised learning of our network. Extensive experiments demonstrate the effectiveness and superiority of our method. Remarkably, our method ranks first among unsupervised learning methods on the MVSEC and DSEC-Flow datasets. Our project page is available at https://wynelio.github.io/E-NMSTFlow.

* Accepted to ICRA 2025. Project Page: https://wynelio.github.io/E-NMSTFlow

Via

Access Paper or Ask Questions

SAM2MOT: A Novel Paradigm of Multi-Object Tracking by Segmentation

Apr 06, 2025

Junjie Jiang, Zelin Wang, Manqi Zhao, Yin Li, DongSheng Jiang

Abstract:Segment Anything 2 (SAM2) enables robust single-object tracking using segmentation. To extend this to multi-object tracking (MOT), we propose SAM2MOT, introducing a novel Tracking by Segmentation paradigm. Unlike Tracking by Detection or Tracking by Query, SAM2MOT directly generates tracking boxes from segmentation masks, reducing reliance on detection accuracy. SAM2MOT has two key advantages: zero-shot generalization, allowing it to work across datasets without fine-tuning, and strong object association, inherited from SAM2. To further improve performance, we integrate a trajectory manager system for precise object addition and removal, and a cross-object interaction module to handle occlusions. Experiments on DanceTrack, UAVDT, and BDD100K show state-of-the-art results. Notably, SAM2MOT outperforms existing methods on DanceTrack by +2.1 HOTA and +4.5 IDF1, highlighting its effectiveness in MOT.

Via

Access Paper or Ask Questions

Optimal Brain Apoptosis

Feb 25, 2025

Mingyuan Sun, Zheng Fang, Jiaxu Wang, Junjie Jiang, Delei Kong, Chenming Hu, Yuetong Fang, Renjing Xu

Abstract:The increasing complexity and parameter count of Convolutional Neural Networks (CNNs) and Transformers pose challenges in terms of computational efficiency and resource demands. Pruning has been identified as an effective strategy to address these challenges by removing redundant elements such as neurons, channels, or connections, thereby enhancing computational efficiency without heavily compromising performance. This paper builds on the foundational work of Optimal Brain Damage (OBD) by advancing the methodology of parameter importance estimation using the Hessian matrix. Unlike previous approaches that rely on approximations, we introduce Optimal Brain Apoptosis (OBA), a novel pruning method that calculates the Hessian-vector product value directly for each parameter. By decomposing the Hessian matrix across network layers and identifying conditions under which inter-layer Hessian submatrices are non-zero, we propose a highly efficient technique for computing the second-order Taylor expansion of parameters. This approach allows for a more precise pruning process, particularly in the context of CNNs and Transformers, as validated in our experiments including VGG19, ResNet32, ResNet50, and ViT-B/16 on CIFAR10, CIFAR100 and Imagenet datasets. Our code is available at https://github.com/NEU-REAL/OBA.

* Accepted to ICLR 2025

Via

Access Paper or Ask Questions

Fully Asynchronous Neuromorphic Perception for Mobile Robot Dodging with Loihi Chips

Oct 14, 2024

Junjie Jiang, Delei Kong, Chenming Hu, Zheng Fang

Abstract:Sparse and asynchronous sensing and processing in natural organisms lead to ultra low-latency and energy-efficient perception. Event cameras, known as neuromorphic vision sensors, are designed to mimic these characteristics. However, fully utilizing the sparse and asynchronous event stream remains challenging. Influenced by the mature algorithms of standard cameras, most existing event-based algorithms still rely on the "group of events" processing paradigm (e.g., event frames, 3D voxels) when handling event streams. This paradigm encounters issues such as feature loss, event stacking, and high computational burden, which deviates from the intended purpose of event cameras. To address these issues, we propose a fully asynchronous neuromorphic paradigm that integrates event cameras, spiking networks, and neuromorphic processors (Intel Loihi). This paradigm can faithfully process each event asynchronously as it arrives, mimicking the spike-driven signal processing in biological brains. We compare the proposed paradigm with the existing "group of events" processing paradigm in detail on the real mobile robot dodging task. Experimental results show that our scheme exhibits better robustness than frame-based methods with different time windows and light conditions. Additionally, the energy consumption per inference of our scheme on the embedded Loihi processor is only 4.30% of that of the event spike tensor method on NVIDIA Jetson Orin NX with energy-saving mode, and 1.64% of that of the event frame method on the same neuromorphic processor. As far as we know, this is the first time that a fully asynchronous neuromorphic paradigm has been implemented for solving sequential tasks on real mobile robot.

Via

Access Paper or Ask Questions

EV-MGDispNet: Motion-Guided Event-Based Stereo Disparity Estimation Network with Left-Right Consistency

Aug 10, 2024

Junjie Jiang, Hao Zhuang, Xinjie Huang, Delei Kong, Zheng Fang

Figure 1 for EV-MGDispNet: Motion-Guided Event-Based Stereo Disparity Estimation Network with Left-Right Consistency

Figure 2 for EV-MGDispNet: Motion-Guided Event-Based Stereo Disparity Estimation Network with Left-Right Consistency

Figure 3 for EV-MGDispNet: Motion-Guided Event-Based Stereo Disparity Estimation Network with Left-Right Consistency

Figure 4 for EV-MGDispNet: Motion-Guided Event-Based Stereo Disparity Estimation Network with Left-Right Consistency

Abstract:Event cameras have the potential to revolutionize the field of robot vision, particularly in areas like stereo disparity estimation, owing to their high temporal resolution and high dynamic range. Many studies use deep learning for event camera stereo disparity estimation. However, these methods fail to fully exploit the temporal information in the event stream to acquire clear event representations. Additionally, there is room for further reduction in pixel shifts in the feature maps before constructing the cost volume. In this paper, we propose EV-MGDispNet, a novel event-based stereo disparity estimation method. Firstly, we propose an edge-aware aggregation (EAA) module, which fuses event frames and motion confidence maps to generate a novel clear event representation. Then, we propose a motion-guided attention (MGA) module, where motion confidence maps utilize deformable transformer encoders to enhance the feature map with more accurate edges. Finally, we also add a census left-right consistency loss function to enhance the left-right consistency of stereo event representation. Through conducting experiments within challenging real-world driving scenarios, we validate that our method outperforms currently known state-of-the-art methods in terms of mean absolute error (MAE) and root mean square error (RMSE) metrics.

Via

Access Paper or Ask Questions

Spike-EVPR: Deep Spiking Residual Network with Cross-Representation Aggregation for Event-Based Visual Place Recognition

Feb 16, 2024

Chenming Hu, Zheng Fang, Kuanxu Hou, Delei Kong, Junjie Jiang, Hao Zhuang, Mingyuan Sun, Xinjie Huang

Figure 1 for Spike-EVPR: Deep Spiking Residual Network with Cross-Representation Aggregation for Event-Based Visual Place Recognition

Figure 2 for Spike-EVPR: Deep Spiking Residual Network with Cross-Representation Aggregation for Event-Based Visual Place Recognition

Figure 3 for Spike-EVPR: Deep Spiking Residual Network with Cross-Representation Aggregation for Event-Based Visual Place Recognition

Figure 4 for Spike-EVPR: Deep Spiking Residual Network with Cross-Representation Aggregation for Event-Based Visual Place Recognition

Abstract:Event cameras have been successfully applied to visual place recognition (VPR) tasks by using deep artificial neural networks (ANNs) in recent years. However, previously proposed deep ANN architectures are often unable to harness the abundant temporal information presented in event streams. In contrast, deep spiking networks exhibit more intricate spatiotemporal dynamics and are inherently well-suited to process sparse asynchronous event streams. Unfortunately, directly inputting temporal-dense event volumes into the spiking network introduces excessive time steps, resulting in prohibitively high training costs for large-scale VPR tasks. To address the aforementioned issues, we propose a novel deep spiking network architecture called Spike-EVPR for event-based VPR tasks. First, we introduce two novel event representations tailored for SNN to fully exploit the spatio-temporal information from the event streams, and reduce the video memory occupation during training as much as possible. Then, to exploit the full potential of these two representations, we construct a Bifurcated Spike Residual Encoder (BSR-Encoder) with powerful representational capabilities to better extract the high-level features from the two event representations. Next, we introduce a Shared & Specific Descriptor Extractor (SSD-Extractor). This module is designed to extract features shared between the two representations and features specific to each. Finally, we propose a Cross-Descriptor Aggregation Module (CDA-Module) that fuses the above three features to generate a refined, robust global descriptor of the scene. Our experimental results indicate the superior performance of our Spike-EVPR compared to several existing EVPR pipelines on Brisbane-Event-VPR and DDD20 datasets, with the average Recall@1 increased by 7.61% on Brisbane and 13.20% on DDD20.

* 14 pages, 10 figures

Via

Access Paper or Ask Questions

FE-Fusion-VPR: Attention-based Multi-Scale Network Architecture for Visual Place Recognition by Fusing Frames and Events

Nov 23, 2022

Kuanxu Hou, Delei Kong, Junjie Jiang, Hao Zhuang, Xinjie Huang, Zheng Fang

Figure 1 for FE-Fusion-VPR: Attention-based Multi-Scale Network Architecture for Visual Place Recognition by Fusing Frames and Events

Figure 2 for FE-Fusion-VPR: Attention-based Multi-Scale Network Architecture for Visual Place Recognition by Fusing Frames and Events

Figure 3 for FE-Fusion-VPR: Attention-based Multi-Scale Network Architecture for Visual Place Recognition by Fusing Frames and Events

Figure 4 for FE-Fusion-VPR: Attention-based Multi-Scale Network Architecture for Visual Place Recognition by Fusing Frames and Events

Abstract:Traditional visual place recognition (VPR), usually using standard cameras, is easy to fail due to glare or high-speed motion. By contrast, event cameras have the advantages of low latency, high temporal resolution, and high dynamic range, which can deal with the above issues. Nevertheless, event cameras are prone to failure in weakly textured or motionless scenes, while standard cameras can still provide appearance information in this case. Thus, exploiting the complementarity of standard cameras and event cameras can effectively improve the performance of VPR algorithms. In the paper, we propose FE-Fusion-VPR, an attention-based multi-scale network architecture for VPR by fusing frames and events. First, the intensity frame and event volume are fed into the two-stream feature extraction network for shallow feature fusion. Next, the three-scale features are obtained through the multi-scale fusion network and aggregated into three sub-descriptors using the VLAD layer. Finally, the weight of each sub-descriptor is learned through the descriptor re-weighting network to obtain the final refined descriptor. Experimental results show that on the Brisbane-Event-VPR and DDD20 datasets, the Recall@1 of our FE-Fusion-VPR is 29.26% and 33.59% higher than Event-VPR and Ensemble-EventVPR, and is 7.00% and 14.15% higher than MultiRes-NetVLAD and NetVLAD. To our knowledge, this is the first end-to-end network that goes beyond the existing event-based and frame-based SOTA methods to fuse frame and events directly for VPR.

Via

Access Paper or Ask Questions

Neuro-Planner: A 3D Visual Navigation Method for MAV with Depth Camera based on Neuromorphic Reinforcement Learning

Oct 05, 2022

Junjie Jiang, Delei Kong, Kuanxv Hou, Xinjie Huang, Hao Zhuang, Fang Zheng

Figure 1 for Neuro-Planner: A 3D Visual Navigation Method for MAV with Depth Camera based on Neuromorphic Reinforcement Learning

Figure 2 for Neuro-Planner: A 3D Visual Navigation Method for MAV with Depth Camera based on Neuromorphic Reinforcement Learning

Figure 3 for Neuro-Planner: A 3D Visual Navigation Method for MAV with Depth Camera based on Neuromorphic Reinforcement Learning

Figure 4 for Neuro-Planner: A 3D Visual Navigation Method for MAV with Depth Camera based on Neuromorphic Reinforcement Learning

Abstract:Traditional visual navigation methods of micro aerial vehicle (MAV) usually calculate a passable path that satisfies the constraints depending on a prior map. However, these methods have issues such as high demand for computing resources and poor robustness in face of unfamiliar environments. Aiming to solve the above problems, we propose a neuromorphic reinforcement learning method (Neuro-Planner) that combines spiking neural network (SNN) and deep reinforcement learning (DRL) to realize MAV 3D visual navigation with depth camera. Specifically, we design spiking actor network based on two-state LIF (TS-LIF) neurons and its encoding-decoding schemes for efficient inference. Then our improved hybrid deep deterministic policy gradient (HDDPG) and TS-LIF-based spatio-temporal back propagation (STBP) algorithms are used as the training framework for actor-critic network architecture. To verify the effectiveness of the proposed Neuro-Planner, we carry out detailed comparison experiments with various SNN training algorithm (STBP, BPTT and SLAYER) in the software-in-the-loop (SITL) simulation framework. The navigation success rate of our HDDPG-STBP is 4.3\% and 5.3\% higher than that of the original DDPG in the two evaluation environments. To the best of our knowledge, this is the first work combining neuromorphic computing and deep reinforcement learning for MAV 3D visual navigation task.

Via

Access Paper or Ask Questions

Predicting extreme events from data using deep machine learning: when and where

Mar 31, 2022

Junjie Jiang, Zi-Gang Huang, Celso Grebogi, Ying-Cheng Lai

Figure 1 for Predicting extreme events from data using deep machine learning: when and where

Figure 2 for Predicting extreme events from data using deep machine learning: when and where

Figure 3 for Predicting extreme events from data using deep machine learning: when and where

Figure 4 for Predicting extreme events from data using deep machine learning: when and where

Abstract:We develop a deep convolutional neural network (DCNN) based framework for model-free prediction of the occurrence of extreme events both in time ("when") and in space ("where") in nonlinear physical systems of spatial dimension two. The measurements or data are a set of two-dimensional snapshots or images. For a desired time horizon of prediction, a proper labeling scheme can be designated to enable successful training of the DCNN and subsequent prediction of extreme events in time. Given that an extreme event has been predicted to occur within the time horizon, a space-based labeling scheme can be applied to predict, within certain resolution, the location at which the event will occur. We use synthetic data from the 2D complex Ginzburg-Landau equation and empirical wind speed data of the North Atlantic ocean to demonstrate and validate our machine-learning based prediction framework. The trade-offs among the prediction horizon, spatial resolution, and accuracy are illustrated, and the detrimental effect of spatially biased occurrence of extreme event on prediction accuracy is discussed. The deep learning framework is viable for predicting extreme events in the real world.

* 15 pages, 10 figures

Via

Access Paper or Ask Questions

Long-term prediction of chaotic systems with recurrent neural networks

Mar 06, 2020

Huawei Fan, Junjie Jiang, Chun Zhang, Xingang Wang, Ying-Cheng Lai

Figure 1 for Long-term prediction of chaotic systems with recurrent neural networks

Figure 2 for Long-term prediction of chaotic systems with recurrent neural networks

Figure 3 for Long-term prediction of chaotic systems with recurrent neural networks

Figure 4 for Long-term prediction of chaotic systems with recurrent neural networks

Abstract:Reservoir computing systems, a class of recurrent neural networks, have recently been exploited for model-free, data-based prediction of the state evolution of a variety of chaotic dynamical systems. The prediction horizon demonstrated has been about half dozen Lyapunov time. Is it possible to significantly extend the prediction time beyond what has been achieved so far? We articulate a scheme incorporating time-dependent but sparse data inputs into reservoir computing and demonstrate that such rare "updates" of the actual state practically enable an arbitrarily long prediction horizon for a variety of chaotic systems. A physical understanding based on the theory of temporal synchronization is developed.

* 10 pages, 8 figures

Via

Access Paper or Ask Questions