Abstract:We investigate whether region-based representations are effective for recognition. Regions were once a mainstay in recognition approaches, but pixel and patch-based features are now used almost exclusively. We show that recent class-agnostic segmenters like SAM can be effectively combined with strong unsupervised representations like DINOv2 and used for a wide variety of tasks, including semantic segmentation, object-based image retrieval, and multi-image analysis. Once the masks and features are extracted, these representations, even with linear decoders, enable competitive performance, making them well suited to applications that require custom queries. The compactness of the representation also makes it well-suited to video analysis and other problems requiring inference across many images.
Abstract:This paper aims at finding a method to register two different point clouds constructed by ORB-SLAM2 and OpenSfM. To do this, we post some tags with unique textures in the scene and take videos and photos of that area. Then we take short videos of only the tags to extract their features. By matching the ORB feature of the tags with their corresponding features in the scene, it is then possible to localize the position of these tags both in point clouds constructed by ORB-SLAM2 and OpenSfM. Thus, the best transformation matrix between two point clouds can be calculated, and the two point clouds can be aligned.