Abstract:Sparse 3D detectors have received significant attention since the query-based paradigm embraces low latency without explicit dense BEV feature construction. However, these detectors achieve worse performance than their dense counterparts. In this paper, we find the key to bridging the performance gap is to enhance the awareness of rich representations in two modalities. Here, we present a high-performance fully sparse detector for end-to-end multi-modality 3D object detection. The detector, termed SparseLIF, contains three key designs, which are (1) Perspective-Aware Query Generation (PAQG) to generate high-quality 3D queries with perspective priors, (2) RoI-Aware Sampling (RIAS) to further refine prior queries by sampling RoI features from each modality, (3) Uncertainty-Aware Fusion (UAF) to precisely quantify the uncertainty of each sensor modality and adaptively conduct final multi-modality fusion, thus achieving great robustness against sensor noises. By the time of submission (2024/03/08), SparseLIF achieves state-of-the-art performance on the nuScenes dataset, ranking 1st on both validation set and test benchmark, outperforming all state-of-the-art 3D object detectors by a notable margin. The source code will be released upon acceptance.
Abstract:Image retrieval has been a top topic in the field of both computer vision and machine learning for a long time. Content based image retrieval, which tries to retrieve images from a database visually similar to a query image, has attracted much attention. Two most important issues of image retrieval are the representation and ranking of the images. Recently, bag-of-words based method has shown its power as a representation method. Moreover, nonnegative matrix factorization is also a popular way to represent the data samples. In addition, contextual similarity learning has also been studied and proven to be an effective method for the ranking problem. However, these technologies have never been used together. In this paper, we developed an effective image retrieval system by representing each image using the bag-of-words method as histograms, and then apply the nonnegative matrix factorization to factorize the histograms, and finally learn the ranking score using the contextual similarity learning method. The proposed novel system is evaluated on a large scale image database and the effectiveness is shown.