Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zequn Qin

HeightFormer: Explicit Height Modeling without Extra Data for Camera-only 3D Object Detection in Bird's Eye View

Jul 25, 2023

Yiming Wu, Ruixiang Li, Zequn Qin, Xinhai Zhao, Xi Li

Abstract:Vision-based Bird's Eye View (BEV) representation is an emerging perception formulation for autonomous driving. The core challenge is to construct BEV space with multi-camera features, which is a one-to-many ill-posed problem. Diving into all previous BEV representation generation methods, we found that most of them fall into two types: modeling depths in image views or modeling heights in the BEV space, mostly in an implicit way. In this work, we propose to explicitly model heights in the BEV space, which needs no extra data like LiDAR and can fit arbitrary camera rigs and types compared to modeling depths. Theoretically, we give proof of the equivalence between height-based methods and depth-based methods. Considering the equivalence and some advantages of modeling heights, we propose HeightFormer, which models heights and uncertainties in a self-recursive way. Without any extra data, the proposed HeightFormer could estimate heights in BEV accurately. Benchmark results show that the performance of HeightFormer achieves SOTA compared with those camera-only methods.

Via

Access Paper or Ask Questions

GaitMPL: Gait Recognition with Memory-Augmented Progressive Learning

Jun 06, 2023

Huanzhang Dou, Pengyi Zhang, Yuhan Zhao, Lin Dong, Zequn Qin, Xi Li

Abstract:Gait recognition aims at identifying the pedestrians at a long distance by their biometric gait patterns. It is inherently challenging due to the various covariates and the properties of silhouettes (textureless and colorless), which result in two kinds of pair-wise hard samples: the same pedestrian could have distinct silhouettes (intra-class diversity) and different pedestrians could have similar silhouettes (inter-class similarity). In this work, we propose to solve the hard sample issue with a Memory-augmented Progressive Learning network (GaitMPL), including Dynamic Reweighting Progressive Learning module (DRPL) and Global Structure-Aligned Memory bank (GSAM). Specifically, DRPL reduces the learning difficulty of hard samples by easy-to-hard progressive learning. GSAM further augments DRPL with a structure-aligned memory mechanism, which maintains and models the feature distribution of each ID. Experiments on two commonly used datasets, CASIA-B and OU-MVLP, demonstrate the effectiveness of GaitMPL. On CASIA-B, we achieve the state-of-the-art performance, i.e., 88.0% on the most challenging condition (Clothing) and 93.3% on the average condition, which outperforms the other methods by at least 3.8% and 1.4%, respectively.

* Accepted by TIP2022

Via

Access Paper or Ask Questions

UniFormer: Unified Multi-view Fusion Transformer for Spatial-Temporal Representation in Bird's-Eye-View

Jul 18, 2022

Zequn Qin, Jingyu Chen, Chao Chen, Xiaozhi Chen, Xi Li

Figure 1 for UniFormer: Unified Multi-view Fusion Transformer for Spatial-Temporal Representation in Bird's-Eye-View

Figure 2 for UniFormer: Unified Multi-view Fusion Transformer for Spatial-Temporal Representation in Bird's-Eye-View

Figure 3 for UniFormer: Unified Multi-view Fusion Transformer for Spatial-Temporal Representation in Bird's-Eye-View

Figure 4 for UniFormer: Unified Multi-view Fusion Transformer for Spatial-Temporal Representation in Bird's-Eye-View

Abstract:Bird's eye view (BEV) representation is a new perception formulation for autonomous driving, which is based on spatial fusion. Further, temporal fusion is also introduced in BEV representation and gains great success. In this work, we propose a new method that unifies both spatial and temporal fusion and merges them into a unified mathematical formulation. The unified fusion could not only provide a new perspective on BEV fusion but also brings new capabilities. With the proposed unified spatial-temporal fusion, our method could support long-range fusion, which is hard to achieve in conventional BEV methods. Moreover, the BEV fusion in our work is temporal-adaptive, and the weights of temporal fusion are learnable. In contrast, conventional methods mainly use fixed and equal weights for temporal fusion. Besides, the proposed unified fusion could avoid information lost in conventional BEV fusion methods and make full use of features. Extensive experiments and ablation studies on the NuScenes dataset show the effectiveness of the proposed method and our method gains the state-of-the-art performance in the map segmentation task.

Via

Access Paper or Ask Questions

Ultra Fast Deep Lane Detection with Hybrid Anchor Driven Ordinal Classification

Jun 15, 2022

Zequn Qin, Pengyi Zhang, Xi Li

Figure 1 for Ultra Fast Deep Lane Detection with Hybrid Anchor Driven Ordinal Classification

Figure 2 for Ultra Fast Deep Lane Detection with Hybrid Anchor Driven Ordinal Classification

Figure 3 for Ultra Fast Deep Lane Detection with Hybrid Anchor Driven Ordinal Classification

Figure 4 for Ultra Fast Deep Lane Detection with Hybrid Anchor Driven Ordinal Classification

Abstract:Modern methods mainly regard lane detection as a problem of pixel-wise segmentation, which is struggling to address the problems of efficiency and challenging scenarios like severe occlusions and extreme lighting conditions. Inspired by human perception, the recognition of lanes under severe occlusions and extreme lighting conditions is mainly based on contextual and global information. Motivated by this observation, we propose a novel, simple, yet effective formulation aiming at ultra fast speed and the problem of challenging scenarios. Specifically, we treat the process of lane detection as an anchor-driven ordinal classification problem using global features. First, we represent lanes with sparse coordinates on a series of hybrid (row and column) anchors. With the help of the anchor-driven representation, we then reformulate the lane detection task as an ordinal classification problem to get the coordinates of lanes. Our method could significantly reduce the computational cost with the anchor-driven representation. Using the large receptive field property of the ordinal classification formulation, we could also handle challenging scenarios. Extensive experiments on four lane detection datasets show that our method could achieve state-of-the-art performance in terms of both speed and accuracy. A lightweight version could even achieve 300+ frames per second(FPS). Our code is at https://github.com/cfzd/Ultra-Fast-Lane-Detection-v2.

* TPAMI 2022

Via

Access Paper or Ask Questions

MonoGround: Detecting Monocular 3D Objects from the Ground

Jun 15, 2022

Zequn Qin, Xi Li

Figure 1 for MonoGround: Detecting Monocular 3D Objects from the Ground

Figure 2 for MonoGround: Detecting Monocular 3D Objects from the Ground

Figure 3 for MonoGround: Detecting Monocular 3D Objects from the Ground

Figure 4 for MonoGround: Detecting Monocular 3D Objects from the Ground

Abstract:Monocular 3D object detection has attracted great attention for its advantages in simplicity and cost. Due to the ill-posed 2D to 3D mapping essence from the monocular imaging process, monocular 3D object detection suffers from inaccurate depth estimation and thus has poor 3D detection results. To alleviate this problem, we propose to introduce the ground plane as a prior in the monocular 3d object detection. The ground plane prior serves as an additional geometric condition to the ill-posed mapping and an extra source in depth estimation. In this way, we can get a more accurate depth estimation from the ground. Meanwhile, to take full advantage of the ground plane prior, we propose a depth-align training strategy and a precise two-stage depth inference method tailored for the ground plane prior. It is worth noting that the introduced ground plane prior requires no extra data sources like LiDAR, stereo images, and depth information. Extensive experiments on the KITTI benchmark show that our method could achieve state-of-the-art results compared with other methods while maintaining a very fast speed. Our code and models are available at https://github.com/cfzd/MonoGround.

* CVPR22

Via

Access Paper or Ask Questions

VersatileGait: A Large-Scale Synthetic Gait Dataset Towards in-the-Wild Simulation

Jun 01, 2021

Pengyi Zhang, Huanzhang Dou, Wenhu Zhang, Yuhan Zhao, Songyuan Li, Zequn Qin, Xi Li

Figure 1 for VersatileGait: A Large-Scale Synthetic Gait Dataset Towards in-the-Wild Simulation

Figure 2 for VersatileGait: A Large-Scale Synthetic Gait Dataset Towards in-the-Wild Simulation

Figure 3 for VersatileGait: A Large-Scale Synthetic Gait Dataset Towards in-the-Wild Simulation

Figure 4 for VersatileGait: A Large-Scale Synthetic Gait Dataset Towards in-the-Wild Simulation

Abstract:Gait recognition has a rapid development in recent years. However, gait recognition in the wild is not well explored yet. An obvious reason could be ascribed to the lack of diverse training data from the perspective of intrinsic and extrinsic factors. To remedy this problem, we propose to construct a large-scale gait dataset with the help of controllable computer simulation. In detail, to diversify the intrinsic factors of gait, we generate numerous characters with diverse attributes and empower them with various types of walking styles. To diversify the extrinsic factors of gait, we build a complicated scene with a dense camera layout. Finally, we design an automated generation toolkit under Unity3D for simulating the walking scenario and capturing the gait data automatically. As a result, we obtain an in-the-wild gait dataset, called VersatileGait, which has more than one million silhouette sequences of 10,000 subjects with diverse scenarios. VersatileGait possesses several nice properties, including huge dataset size, diverse pedestrian attributes, complicated camera layout, high-quality annotations, small domain gap with the real one, good scalability for new demands, and no privacy issues. Based on VersatileGait, we propose series of experiments and applications for both research exploration of gait in the wild and practical applications. Our dataset and its corresponding generation toolkit will be publicly available for further studies.

* We should have updated 2101.01394 but we did a new submission

Via

Access Paper or Ask Questions

VersatileGait: A Large-Scale Synthetic Gait Dataset with Fine-GrainedAttributes and Complicated Scenarios

Jan 05, 2021

Huanzhang Dou, Wenhu Zhang, Pengyi Zhang, Yuhan Zhao, Songyuan Li, Zequn Qin, Fei Wu, Lin Dong, Xi Li

Figure 1 for VersatileGait: A Large-Scale Synthetic Gait Dataset with Fine-GrainedAttributes and Complicated Scenarios

Figure 2 for VersatileGait: A Large-Scale Synthetic Gait Dataset with Fine-GrainedAttributes and Complicated Scenarios

Figure 3 for VersatileGait: A Large-Scale Synthetic Gait Dataset with Fine-GrainedAttributes and Complicated Scenarios

Figure 4 for VersatileGait: A Large-Scale Synthetic Gait Dataset with Fine-GrainedAttributes and Complicated Scenarios

Abstract:With the motivation of practical gait recognition applications, we propose to automatically create a large-scale synthetic gait dataset (called VersatileGait) by a game engine, which consists of around one million silhouette sequences of 11,000 subjects with fine-grained attributes in various complicated scenarios. Compared with existing real gait datasets with limited samples and simple scenarios, the proposed VersatileGait dataset possesses several nice properties, including huge dataset size, high sample diversity, high-quality annotations, multi-pitch angles, small domain gap with the real one, etc. Furthermore, we investigate the effectiveness of our dataset (e.g., domain transfer after pretraining). Then, we use the fine-grained attributes from VersatileGait to promote gait recognition in both accuracy and speed, and meanwhile justify the gait recognition performance under multi-pitch angle settings. Additionally, we explore a variety of potential applications for research.Extensive experiments demonstrate the value and effective-ness of the proposed VersatileGait in gait recognition along with its associated applications. We will release both VersatileGait and its corresponding data generation toolkit for further studies.

Via

Access Paper or Ask Questions

FcaNet: Frequency Channel Attention Networks

Dec 23, 2020

Zequn Qin, Pengyi Zhang, Fei Wu, Xi Li

Figure 1 for FcaNet: Frequency Channel Attention Networks

Figure 2 for FcaNet: Frequency Channel Attention Networks

Figure 3 for FcaNet: Frequency Channel Attention Networks

Figure 4 for FcaNet: Frequency Channel Attention Networks

Abstract:Attention mechanism, especially channel attention, has gained great success in the computer vision field. Many works focus on how to design efficient channel attention mechanisms while ignoring a fundamental problem, i.e., using global average pooling (GAP) as the unquestionable pre-processing method. In this work, we start from a different view and rethink channel attention using frequency analysis. Based on the frequency analysis, we mathematically prove that the conventional GAP is a special case of the feature decomposition in the frequency domain. With the proof, we naturally generalize the pre-processing of channel attention mechanism in the frequency domain and propose FcaNet with novel multi-spectral channel attention. The proposed method is simple but effective. We can change only one line of code in the calculation to implement our method within existing channel attention methods. Moreover, the proposed method achieves state-of-the-art results compared with other channel attention methods on image classification, object detection, and instance segmentation tasks. Our method could improve by 1.8% in terms of Top-1 accuracy on ImageNet compared with the baseline SENet-50, with the same number of parameters and the same computational cost. Our code and models will be made publicly available.

Via

Access Paper or Ask Questions

Dynamic Routing with Path Diversity and Consistency for Compact Network Learning

May 29, 2020

Huanyu Wang, Zequn Qin, Xi Li

Figure 1 for Dynamic Routing with Path Diversity and Consistency for Compact Network Learning

Figure 2 for Dynamic Routing with Path Diversity and Consistency for Compact Network Learning

Figure 3 for Dynamic Routing with Path Diversity and Consistency for Compact Network Learning

Figure 4 for Dynamic Routing with Path Diversity and Consistency for Compact Network Learning

Abstract:In this paper, we propose a novel dynamic routing inference method with diversity and consistency that better takes advantage of the network capacity. Specifically, by diverse routing, we achieve the goal of better utilizing of the network. By consistent routing, the better optimization of the routing mechanism is realized. Moreover, we propose a customizable computational cost controlling method that could balance the trade-off between cost and accuracy. Extensive ablation studies and experiments show that our method could achieve state-of-the-art results compared with the original full network, other dynamic networks and model compression methods. Our code will be made publicly available.

Via

Access Paper or Ask Questions

Ultra Fast Structure-aware Deep Lane Detection

May 20, 2020

Zequn Qin, Huanyu Wang, Xi Li

Figure 1 for Ultra Fast Structure-aware Deep Lane Detection

Figure 2 for Ultra Fast Structure-aware Deep Lane Detection

Figure 3 for Ultra Fast Structure-aware Deep Lane Detection

Figure 4 for Ultra Fast Structure-aware Deep Lane Detection

Abstract:Modern methods mainly regard lane detection as a problem of pixel-wise segmentation, which is struggling to address the problem of challenging scenarios and speed. Inspired by human perception, the recognition of lanes under severe occlusion and extreme lighting conditions is mainly based on contextual and global information. Motivated by this observation, we propose a novel, simple, yet effective formulation aiming at extremely fast speed and challenging scenarios. Specifically, we treat the process of lane detection as a row-based selecting problem using global features. With the help of row-based selecting, our formulation could significantly reduce the computational cost. Using a large receptive field on global features, we could also handle the challenging scenarios. Moreover, based on the formulation, we also propose a structural loss to explicitly model the structure of lanes. Extensive experiments on two lane detection benchmark datasets show that our method could achieve the state-of-the-art performance in terms of both speed and accuracy. A light-weight version could even achieve 300+ frames per second with the same resolution, which is at least 4x faster than previous state-of-the-art methods. Our code will be made publicly available.

Via

Access Paper or Ask Questions