Abstract:Transformer networks have proven extremely powerful for a wide variety of tasks since they were introduced. Computer vision is not an exception, as the use of transformers has become very popular in the vision community in recent years. Despite this wave, multiple-object tracking (MOT) exhibits for now some sort of incompatibility with transformers. We argue that the standard representation -- bounding boxes -- is not adapted to learning transformers for MOT. Inspired by recent research, we propose TransCenter, the first transformer-based architecture for tracking the centers of multiple targets. Methodologically, we propose the use of dense queries in a double-decoder network, to be able to robustly infer the heatmap of targets' centers and associate them through time. TransCenter outperforms the current state-of-the-art in multiple-object tracking, both in MOT17 and MOT20. Our ablation study demonstrates the advantage in the proposed architecture compared to more naive alternatives. The code will be made publicly available.
Abstract:Unsupervised person re-identification (Re-ID) methods consist of training with a carefully labeled source dataset, followed by generalization to an unlabeled target dataset, i.e. person-identity information is unavailable. Inspired by domain adaptation techniques, these methods avoid a costly, tedious and often unaffordable labeling process. This paper investigates the use of camera-index information, namely which camera captured which image, for unsupervised person Re-ID. More precisely, inspired by domain adaptation adversarial approaches, we develop an adversarial framework in which the output of the feature extractor should be useful for person Re-ID and in the same time should fool a camera discriminator. We refer to the proposed method as camera adversarial transfer (CAT). We evaluate adversarial variants and, alongside, the camera robustness achieved for each variant. We report cross-dataset ReID performance and we compare the variants of our method with several state-of-the-art methods, thus showing the interest of exploiting camera-index information within an adversarial framework for the unsupervised person Re-ID.