Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Maguelonne Héritier

FFAVOD: Feature Fusion Architecture for Video Object Detection

Sep 15, 2021

Hughes Perreault, Guillaume-Alexandre Bilodeau, Nicolas Saunier, Maguelonne Héritier

Figure 1 for FFAVOD: Feature Fusion Architecture for Video Object Detection

Figure 2 for FFAVOD: Feature Fusion Architecture for Video Object Detection

Figure 3 for FFAVOD: Feature Fusion Architecture for Video Object Detection

Figure 4 for FFAVOD: Feature Fusion Architecture for Video Object Detection

Abstract:A significant amount of redundancy exists between consecutive frames of a video. Object detectors typically produce detections for one image at a time, without any capabilities for taking advantage of this redundancy. Meanwhile, many applications for object detection work with videos, including intelligent transportation systems, advanced driver assistance systems and video surveillance. Our work aims at taking advantage of the similarity between video frames to produce better detections. We propose FFAVOD, standing for feature fusion architecture for video object detection. We first introduce a novel video object detection architecture that allows a network to share feature maps between nearby frames. Second, we propose a feature fusion module that learns to merge feature maps to enhance them. We show that using the proposed architecture and the fusion module can improve the performance of three base object detectors on two object detection benchmarks containing sequences of moving road users. Additionally, to further increase performance, we propose an improvement to the SpotNet attention module. Using our architecture on the improved SpotNet detector, we obtain the state-of-the-art performance on the UA-DETRAC public benchmark as well as on the UAVDT dataset. Code is available at https://github.com/hu64/FFAVOD.

* Accepted for publication in Pattern Recognition Letters

Via

Access Paper or Ask Questions

CenterPoly: real-time instance segmentation using bounding polygons

Aug 19, 2021

Hughes Perreault, Guillaume-Alexandre Bilodeau, Nicolas Saunier, Maguelonne Héritier

Figure 1 for CenterPoly: real-time instance segmentation using bounding polygons

Figure 2 for CenterPoly: real-time instance segmentation using bounding polygons

Figure 3 for CenterPoly: real-time instance segmentation using bounding polygons

Figure 4 for CenterPoly: real-time instance segmentation using bounding polygons

Abstract:We present a novel method, called CenterPoly, for real-time instance segmentation using bounding polygons. We apply it to detect road users in dense urban environments, making it suitable for applications in intelligent transportation systems like automated vehicles. CenterPoly detects objects by their center keypoint while predicting a fixed number of polygon vertices for each object, thus performing detection and segmentation in parallel. Most of the network parameters are shared by the network heads, making it fast and lightweight enough to run at real-time speed. To properly convert mask ground-truth to polygon ground-truth, we designed a vertex selection strategy to facilitate the learning of the polygons. Additionally, to better segment overlapping objects in dense urban scenes, we also train a relative depth branch to determine which instances are closer and which are further, using available weak annotations. We propose several models with different backbones to show the possible speed / accuracy trade-offs. The models were trained and evaluated on Cityscapes, KITTI and IDD and the results are reported on their public benchmark, which are state-of-the-art at real-time speeds. Code is available at https://github.com/hu64/CenterPoly

* Accepted to the 2nd Autonomous Vehicle Vision Workshop (AVVision)

Via

Access Paper or Ask Questions

RN-VID: A Feature Fusion Architecture for Video Object Detection

Apr 02, 2020

Hughes Perreault, Maguelonne Héritier, Pierre Gravel, Guillaume-Alexandre Bilodeau, Nicolas Saunier

Figure 1 for RN-VID: A Feature Fusion Architecture for Video Object Detection

Figure 2 for RN-VID: A Feature Fusion Architecture for Video Object Detection

Figure 3 for RN-VID: A Feature Fusion Architecture for Video Object Detection

Figure 4 for RN-VID: A Feature Fusion Architecture for Video Object Detection

Abstract:Consecutive frames in a video are highly redundant. Therefore, to perform the task of video object detection, executing single frame detectors on every frame without reusing any information is quite wasteful. It is with this idea in mind that we propose RN-VID (standing for RetinaNet-VIDeo), a novel approach to video object detection. Our contributions are twofold. First, we propose a new architecture that allows the usage of information from nearby frames to enhance feature maps. Second, we propose a novel module to merge feature maps of same dimensions using re-ordering of channels and 1 x 1 convolutions. We then demonstrate that RN-VID achieves better mean average precision (mAP) than corresponding single frame detectors with little additional cost during inference.

Via

Access Paper or Ask Questions

SpotNet: Self-Attention Multi-Task Network for Object Detection

Feb 13, 2020

Hughes Perreault, Guillaume-Alexandre Bilodeau, Nicolas Saunier, Maguelonne Héritier

Figure 1 for SpotNet: Self-Attention Multi-Task Network for Object Detection

Figure 2 for SpotNet: Self-Attention Multi-Task Network for Object Detection

Figure 3 for SpotNet: Self-Attention Multi-Task Network for Object Detection

Figure 4 for SpotNet: Self-Attention Multi-Task Network for Object Detection

Abstract:Humans are very good at directing their visual attention toward relevant areas when they search for different types of objects. For instance, when we search for cars, we will look at the streets, not at the top of buildings. The motivation of this paper is to train a network to do the same via a multi-task learning approach. To train visual attention, we produce foreground/background segmentation labels in a semi-supervised way, using background subtraction or optical flow. Using these labels, we train an object detection model to produce foreground/background segmentation maps as well as bounding boxes while sharing most model parameters. We use those segmentation maps inside the network as a self-attention mechanism to weight the feature map used to produce the bounding boxes, decreasing the signal of non-relevant areas. We show that by using this method, we obtain a significant mAP improvement on two traffic surveillance datasets, with state-of-the-art results on both UA-DETRAC and UAVDT.

Via

Access Paper or Ask Questions