Abstract:In the evolving landscape of computer vision, foundation models have emerged as pivotal tools, exhibiting exceptional adaptability to a myriad of tasks. Among these, the Segment Anything Model (SAM) by Meta AI has distinguished itself in image segmentation. However, SAM, like its counterparts, encounters limitations in specific niche applications, prompting a quest for enhancement strategies that do not compromise its inherent capabilities. This paper introduces ASAM, a novel methodology that amplifies SAM's performance through adversarial tuning. We harness the potential of natural adversarial examples, inspired by their successful implementation in natural language processing. By utilizing a stable diffusion model, we augment a subset (1%) of the SA-1B dataset, generating adversarial instances that are more representative of natural variations rather than conventional imperceptible perturbations. Our approach maintains the photorealism of adversarial examples and ensures alignment with original mask annotations, thereby preserving the integrity of the segmentation task. The fine-tuned ASAM demonstrates significant improvements across a diverse range of segmentation tasks without necessitating additional data or architectural modifications. The results of our extensive evaluations confirm that ASAM establishes new benchmarks in segmentation tasks, thereby contributing to the advancement of foundational models in computer vision. Our project page is in https://asam2024.github.io/.
Abstract:Co-salient Object Detection (CoSOD) endeavors to replicate the human visual system's capacity to recognize common and salient objects within a collection of images. Despite recent advancements in deep learning models, these models still rely on training with well-annotated CoSOD datasets. The exploration of training-free zero-shot CoSOD frameworks has been limited. In this paper, taking inspiration from the zero-shot transfer capabilities of foundational computer vision models, we introduce the first zero-shot CoSOD framework that harnesses these models without any training process. To achieve this, we introduce two novel components in our proposed framework: the group prompt generation (GPG) module and the co-saliency map generation (CMP) module. We evaluate the framework's performance on widely-used datasets and observe impressive results. Our approach surpasses existing unsupervised methods and even outperforms fully supervised methods developed before 2020, while remaining competitive with some fully supervised methods developed before 2022.
Abstract:SAM is a segmentation model recently released by Meta AI Research and has been gaining attention quickly due to its impressive performance in generic object segmentation. However, its ability to generalize to specific scenes such as camouflaged scenes is still unknown. Camouflaged object detection (COD) involves identifying objects that are seamlessly integrated into their surroundings and has numerous practical applications in fields such as medicine, art, and agriculture. In this study, we try to ask if SAM can address the COD task and evaluate the performance of SAM on the COD benchmark by employing maximum segmentation evaluation and camouflage location evaluation. We also compare SAM's performance with 22 state-of-the-art COD methods. Our results indicate that while SAM shows promise in generic object segmentation, its performance on the COD task is limited. This presents an opportunity for further research to explore how to build a stronger SAM that may address the COD task. The results of this paper are provided in \url{https://github.com/luckybird1994/SAMCOD}.