Abstract:Camouflaged Object Segmentation (COS) faces significant challenges due to the scarcity of annotated data, where meticulous pixel-level annotation is both labor-intensive and costly, primarily due to the intricate object-background boundaries. Addressing the core question, "Can COS be effectively achieved in a zero-shot manner without manual annotations for any camouflaged object?" we affirmatively respond and introduce a robust zero-shot COS framework. This framework leverages the inherent local pattern bias of COS and employs a broad semantic feature space derived from salient object segmentation (SOS) for efficient zero-shot transfer. We incorporate an Masked Image Modeling (MIM) based image encoder optimized for Parameter-Efficient Fine-Tuning (PEFT), a Multimodal Large Language Model (M-LLM), and a Multi-scale Fine-grained Alignment (MFA) mechanism. The MIM pre-trained image encoder focuses on capturing essential low-level features, while the M-LLM generates caption embeddings processed alongside these visual cues. These embeddings are precisely aligned using MFA, enabling our framework to accurately interpret and navigate complex semantic contexts. To optimize operational efficiency, we introduce a learnable codebook that represents the M-LLM during inference, significantly reducing computational overhead. Our framework demonstrates its versatility and efficacy through rigorous experimentation, achieving state-of-the-art performance in zero-shot COS with $F_{\beta}^w$ scores of 72.9\% on CAMO and 71.7\% on COD10K. By removing the M-LLM during inference, we achieve an inference speed comparable to that of traditional end-to-end models, reaching 18.1 FPS. Code: https://github.com/R-LEI360725/ZSCOS-CaMF
Abstract:Image stitching algorithms often adopt the global transformation, such as homography, and work well for planar scenes or parallax free camera motions. However, these conditions are easily violated in practice. With casual camera motions, variable taken views, large depth change, or complex structures, it is a challenging task for stitching these images. The global transformation model often provides dreadful stitching results, such as misalignments or projective distortions, especially perspective distortion. To this end, we suggest a perspective-preserving warping for image stitching, which spatially combines local projective transformations and similarity transformation. By weighted combination scheme, our approach gradually extrapolates the local projective transformations of the overlapping regions into the non-overlapping regions, and thus the final warping can smoothly change from projective to similarity. The proposed method can provide satisfactory alignment accuracy as well as reduce the projective distortions and maintain the multi-perspective view. Experiments on a variety of challenging images confirm the efficiency of the approach.