Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Romaric Audigier

Open-set object detection: towards unified problem formulation and benchmarking

Nov 08, 2024

Hejer Ammar, Nikita Kiselov, Guillaume Lapouge, Romaric Audigier

Figure 1 for Open-set object detection: towards unified problem formulation and benchmarking

Figure 2 for Open-set object detection: towards unified problem formulation and benchmarking

Figure 3 for Open-set object detection: towards unified problem formulation and benchmarking

Figure 4 for Open-set object detection: towards unified problem formulation and benchmarking

Abstract:In real-world applications where confidence is key, like autonomous driving, the accurate detection and appropriate handling of classes differing from those used during training are crucial. Despite the proposal of various unknown object detection approaches, we have observed widespread inconsistencies among them regarding the datasets, metrics, and scenarios used, alongside a notable absence of a clear definition for unknown objects, which hampers meaningful evaluation. To counter these issues, we introduce two benchmarks: a unified VOC-COCO evaluation, and the new OpenImagesRoad benchmark which provides clear hierarchical object definition besides new evaluation metrics. Complementing the benchmark, we exploit recent self-supervised Vision Transformers performance, to improve pseudo-labeling-based OpenSet Object Detection (OSOD), through OW-DETR++. State-of-the-art methods are extensively evaluated on the proposed benchmarks. This study provides a clear problem definition, ensures consistent evaluations, and draws new conclusions about effectiveness of OSOD strategies.

* Accepted at ECCV 2024 Workshop: "The 3rd Workshop for Out-of-Distribution Generalization in Computer Vision Foundation Models"

Via

Access Paper or Ask Questions

Towards Few-Annotation Learning for Object Detection: Are Transformer-based Models More Efficient ?

Oct 30, 2023

Quentin Bouniot, Angélique Loesch, Romaric Audigier, Amaury Habrard

Abstract:For specialized and dense downstream tasks such as object detection, labeling data requires expertise and can be very expensive, making few-shot and semi-supervised models much more attractive alternatives. While in the few-shot setup we observe that transformer-based object detectors perform better than convolution-based two-stage models for a similar amount of parameters, they are not as effective when used with recent approaches in the semi-supervised setting. In this paper, we propose a semi-supervised method tailored for the current state-of-the-art object detector Deformable DETR in the few-annotation learning setup using a student-teacher architecture, which avoids relying on a sensitive post-processing of the pseudo-labels generated by the teacher model. We evaluate our method on the semi-supervised object detection benchmarks COCO and Pascal VOC, and it outperforms previous methods, especially when annotations are scarce. We believe that our contributions open new possibilities to adapt similar object detection methods in this setup as well.

* Published at WACV 2023

Via

Access Paper or Ask Questions

Proposal-Contrastive Pretraining for Object Detection from Fewer Data

Oct 25, 2023

Quentin Bouniot, Romaric Audigier, Angélique Loesch, Amaury Habrard

Abstract:The use of pretrained deep neural networks represents an attractive way to achieve strong results with few data available. When specialized in dense problems such as object detection, learning local rather than global information in images has proven to be more efficient. However, for unsupervised pretraining, the popular contrastive learning requires a large batch size and, therefore, a lot of resources. To address this problem, we are interested in transformer-based object detectors that have recently gained traction in the community with good performance and with the particularity of generating many diverse object proposals. In this work, we present Proposal Selection Contrast (ProSeCo), a novel unsupervised overall pretraining approach that leverages this property. ProSeCo uses the large number of object proposals generated by the detector for contrastive learning, which allows the use of a smaller batch size, combined with object-level features to learn local information in the images. To improve the effectiveness of the contrastive loss, we introduce the object location information in the selection of positive examples to take into account multiple overlapping object proposals. When reusing pretrained backbone, we advocate for consistency in learning local information between the backbone and the detection head. We show that our method outperforms state of the art in unsupervised pretraining for object detection on standard and novel benchmarks in learning with fewer data.

* Published as a conference paper at ICLR 2023

Via

Access Paper or Ask Questions

Spatio-temporal predictive tasks for abnormal event detection in videos

Oct 27, 2022

Yassine Naji, Aleksandr Setkov, Angélique Loesch, Michèle Gouiffès, Romaric Audigier

Abstract:Abnormal event detection in videos is a challenging problem, partly due to the multiplicity of abnormal patterns and the lack of their corresponding annotations. In this paper, we propose new constrained pretext tasks to learn object level normality patterns. Our approach consists in learning a mapping between down-scaled visual queries and their corresponding normal appearance and motion characteristics at the original resolution. The proposed tasks are more challenging than reconstruction and future frame prediction tasks which are widely used in the literature, since our model learns to jointly predict spatial and temporal features rather than reconstructing them. We believe that more constrained pretext tasks induce a better learning of normality patterns. Experiments on several benchmark datasets demonstrate the effectiveness of our approach to localize and track anomalies as it outperforms or reaches the current state-of-the-art on spatio-temporal evaluation metrics.

* Accepted at the 18th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), 2022

Via

Access Paper or Ask Questions

Object-centric and memory-guided normality reconstruction for video anomaly detection

Mar 07, 2022

Khalil Bergaoui, Yassine Naji, Aleksandr Setkov, Angélique Loesch, Michèle Gouiffès, Romaric Audigier

Figure 1 for Object-centric and memory-guided normality reconstruction for video anomaly detection

Figure 2 for Object-centric and memory-guided normality reconstruction for video anomaly detection

Figure 3 for Object-centric and memory-guided normality reconstruction for video anomaly detection

Abstract:This paper addresses video anomaly detection problem for videosurveillance. Due to the inherent rarity and heterogeneity of abnormal events, the problem is viewed as a normality modeling strategy, in which our model learns object-centric normal patterns without seeing anomalous samples during training. The main contributions consist in coupling pretrained object-level action features prototypes with a cosine distance-based anomaly estimation function, therefore extending previous methods by introducing additional constraints to the mainstream reconstruction-based strategy. Our framework leverages both appearance and motion information to learn object-level behavior and captures prototypical patterns within a memory module. Experiments on several well-known datasets demonstrate the effectiveness of our method as it outperforms current state-of-the-art on most relevant spatio-temporal evaluation metrics.

Via

Access Paper or Ask Questions

End-to-end Person Search Sequentially Trained on Aggregated Dataset

Jan 24, 2022

Angelique Loesch, Jaonary Rabarisoa, Romaric Audigier

Figure 1 for End-to-end Person Search Sequentially Trained on Aggregated Dataset

Figure 2 for End-to-end Person Search Sequentially Trained on Aggregated Dataset

Abstract:In video surveillance applications, person search is a challenging task consisting in detecting people and extracting features from their silhouette for re-identification (re-ID) purpose. We propose a new end-to-end model that jointly computes detection and feature extraction steps through a single deep Convolutional Neural Network architecture. Sharing feature maps between the two tasks for jointly describing people commonalities and specificities allows faster runtime, which is valuable in real-world applications. In addition to reaching state-of-the-art accuracy, this multi-task model can be sequentially trained task-by-task, which results in a broader acceptance of input dataset types. Indeed, we show that aggregating more pedestrian detection datasets without costly identity annotations makes the shared feature maps more generic, and improves re-ID precision. Moreover, these boosted shared feature maps result in re-ID features more robust to a cross-dataset scenario.

* Published in: 2019 IEEE International Conference on Image Processing (ICIP)
* 5 pages

Via

Access Paper or Ask Questions

Describe me if you can! Characterized Instance-level Human Parsing

Jan 24, 2022

Angelique Loesch, Romaric Audigier

Figure 1 for Describe me if you can! Characterized Instance-level Human Parsing

Figure 2 for Describe me if you can! Characterized Instance-level Human Parsing

Figure 3 for Describe me if you can! Characterized Instance-level Human Parsing

Figure 4 for Describe me if you can! Characterized Instance-level Human Parsing

Abstract:Several computer vision applications such as person search or online fashion rely on human description. The use of instance-level human parsing (HP) is therefore relevant since it localizes semantic attributes and body parts within a person. But how to characterize these attributes? To our knowledge, only some single-HP datasets describe attributes with some color, size and/or pattern characteristics. There is a lack of dataset for multi-HP in the wild with such characteristics. In this article, we propose the dataset CCIHP based on the multi-HP dataset CIHP, with 20 new labels covering these 3 kinds of characteristics. In addition, we propose HPTR, a new bottom-up multi-task method based on transformers as a fast and scalable baseline. It is the fastest method of multi-HP state of the art while having precision comparable to the most precise bottom-up method. We hope this will encourage research for fast and accurate methods of precise human descriptions.

* Published in: 2021 IEEE International Conference on Image Processing (ICIP)
* 5 pages

Via

Access Paper or Ask Questions

Detecting Human-to-Human-or-Object (H2O) Interactions with DIABOLO

Jan 07, 2022

Astrid Orcesi, Romaric Audigier, Fritz Poka Toukam, Bertrand Luvison

Figure 1 for Detecting Human-to-Human-or-Object (H2O) Interactions with DIABOLO

Figure 2 for Detecting Human-to-Human-or-Object (H2O) Interactions with DIABOLO

Figure 3 for Detecting Human-to-Human-or-Object (H2O) Interactions with DIABOLO

Figure 4 for Detecting Human-to-Human-or-Object (H2O) Interactions with DIABOLO

Abstract:Detecting human interactions is crucial for human behavior analysis. Many methods have been proposed to deal with Human-to-Object Interaction (HOI) detection, i.e., detecting in an image which person and object interact together and classifying the type of interaction. However, Human-to-Human Interactions, such as social and violent interactions, are generally not considered in available HOI training datasets. As we think these types of interactions cannot be ignored and decorrelated from HOI when analyzing human behavior, we propose a new interaction dataset to deal with both types of human interactions: Human-to-Human-or-Object (H2O). In addition, we introduce a novel taxonomy of verbs, intended to be closer to a description of human body attitude in relation to the surrounding targets of interaction, and more independent of the environment. Unlike some existing datasets, we strive to avoid defining synonymous verbs when their use highly depends on the target type or requires a high level of semantic interpretation. As H2O dataset includes V-COCO images annotated with this new taxonomy, images obviously contain more interactions. This can be an issue for HOI detection methods whose complexity depends on the number of people, targets or interactions. Thus, we propose DIABOLO (Detecting InterActions By Only Looking Once), an efficient subject-centric single-shot method to detect all interactions in one forward pass, with constant inference time independent of image content. In addition, this multi-task network simultaneously detects all people and objects. We show how sharing a network for these tasks does not only save computation resource but also improves performance collaboratively. Finally, DIABOLO is a strong baseline for the new proposed challenge of H2O Interaction detection, as it outperforms all state-of-the-art methods when trained and evaluated on HOI dataset V-COCO.

* ACCEPTED in IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021)

Via

Access Paper or Ask Questions

A formal approach to good practices in Pseudo-Labeling for Unsupervised Domain Adaptive Re-Identification

Dec 28, 2021

Fabian Dubourvieux, Romaric Audigier, Angélique Loesch, Samia Ainouz, Stéphane Canu

Figure 1 for A formal approach to good practices in Pseudo-Labeling for Unsupervised Domain Adaptive Re-Identification

Figure 2 for A formal approach to good practices in Pseudo-Labeling for Unsupervised Domain Adaptive Re-Identification

Figure 3 for A formal approach to good practices in Pseudo-Labeling for Unsupervised Domain Adaptive Re-Identification

Figure 4 for A formal approach to good practices in Pseudo-Labeling for Unsupervised Domain Adaptive Re-Identification

Abstract:The use of pseudo-labels prevails in order to tackle Unsupervised Domain Adaptive (UDA) Re-Identification (re-ID) with the best performance. Indeed, this family of approaches has given rise to several UDA re-ID specific frameworks, which are effective. In these works, research directions to improve Pseudo-Labeling UDA re-ID performance are varied and mostly based on intuition and experiments: refining pseudo-labels, reducing the impact of errors in pseudo-labels... It can be hard to deduce from them general good practices, which can be implemented in any Pseudo-Labeling method, to consistently improve its performance. To address this key question, a new theoretical view on Pseudo-Labeling UDA re-ID is proposed. The contributions are threefold: (i) A novel theoretical framework for Pseudo-Labeling UDA re-ID, formalized through a new general learning upper-bound on the UDA re-ID performance. (ii) General good practices for Pseudo-Labeling, directly deduced from the interpretation of the proposed theoretical framework, in order to improve the target re-ID performance. (iii) Extensive experiments on challenging person and vehicle cross-dataset re-ID tasks, showing consistent performance improvements for various state-of-the-art methods and various proposed implementations of good practices.

* This paper is a preprint under submission at CVIU for review

Via

Access Paper or Ask Questions

Improving Unsupervised Domain Adaptive Re-Identification via Source-Guided Selection of Pseudo-Labeling Hyperparameters

Nov 04, 2021

Fabian Dubourvieux, Angélique Loesch, Romaric Audigier, Samia Ainouz, Stéphane Canu

Figure 1 for Improving Unsupervised Domain Adaptive Re-Identification via Source-Guided Selection of Pseudo-Labeling Hyperparameters

Figure 2 for Improving Unsupervised Domain Adaptive Re-Identification via Source-Guided Selection of Pseudo-Labeling Hyperparameters

Figure 3 for Improving Unsupervised Domain Adaptive Re-Identification via Source-Guided Selection of Pseudo-Labeling Hyperparameters

Figure 4 for Improving Unsupervised Domain Adaptive Re-Identification via Source-Guided Selection of Pseudo-Labeling Hyperparameters

Abstract:Unsupervised Domain Adaptation (UDA) for re-identification (re-ID) is a challenging task: to avoid a costly annotation of additional data, it aims at transferring knowledge from a domain with annotated data to a domain of interest with only unlabeled data. Pseudo-labeling approaches have proven to be effective for UDA re-ID. However, the effectiveness of these approaches heavily depends on the choice of some hyperparameters (HP) that affect the generation of pseudo-labels by clustering. The lack of annotation in the domain of interest makes this choice non-trivial. Current approaches simply reuse the same empirical value for all adaptation tasks and regardless of the target data representation that changes through pseudo-labeling training phases. As this simplistic choice may limit their performance, we aim at addressing this issue. We propose new theoretical grounds on HP selection for clustering UDA re-ID as well as method of automatic and cyclic HP tuning for pseudo-labeling UDA clustering: HyPASS. HyPASS consists in incorporating two modules in pseudo-labeling methods: (i) HP selection based on a labeled source validation set and (ii) conditional domain alignment of feature discriminativeness to improve HP selection based on source samples. Experiments on commonly used person re-ID and vehicle re-ID datasets show that our proposed HyPASS consistently improves the best state-of-the-art methods in re-ID compared to the commonly used empirical HP setting.

* Preprint version. Accepted in IEEE Access (see IEEE Access for final version)

Via

Access Paper or Ask Questions