Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tengping Jiang

Token is All You Need for Zero-Shot Semantic Segmentation

Apr 13, 2023

Letian Wu, Wenyao Zhang, Tengping Jiang, Wankou Yang, Xin Jin, Wenjun Zeng

Figure 1 for Token is All You Need for Zero-Shot Semantic Segmentation

Figure 2 for Token is All You Need for Zero-Shot Semantic Segmentation

Figure 3 for Token is All You Need for Zero-Shot Semantic Segmentation

Figure 4 for Token is All You Need for Zero-Shot Semantic Segmentation

Abstract:In this paper, we propose an embarrassingly simple yet highly effective zero-shot semantic segmentation (ZS3) method, based on the pre-trained vision-language model CLIP. First, our study provides a couple of key discoveries: (i) the global tokens (a.k.a [CLS] tokens in Transformer) of the text branch in CLIP provide a powerful representation of semantic information and (ii) these text-side [CLS] tokens can be regarded as category priors to guide CLIP visual encoder pay more attention on the corresponding region of interest. Based on that, we build upon the CLIP model as a backbone which we extend with a One-Way [CLS] token navigation from text to the visual branch that enables zero-shot dense prediction, dubbed \textbf{ClsCLIP}. Specifically, we use the [CLS] token output from the text branch, as an auxiliary semantic prompt, to replace the [CLS] token in shallow layers of the ViT-based visual encoder. This one-way navigation embeds such global category prior earlier and thus promotes semantic segmentation. Furthermore, to better segment tiny objects in ZS3, we further enhance ClsCLIP with a local zoom-in strategy, which employs a region proposal pre-processing and we get ClsCLIP+. Extensive experiments demonstrate that our proposed ZS3 method achieves a SOTA performance, and it is even comparable with those few-shot semantic segmentation methods.

* 8 pages,6 figures

Via

Access Paper or Ask Questions

AdaFit: Rethinking Learning-based Normal Estimation on Point Clouds

Aug 12, 2021

Runsong Zhu, Yuan Liu, Zhen Dong, Tengping Jiang, Yuan Wang, Wenping Wang, Bisheng Yang

Figure 1 for AdaFit: Rethinking Learning-based Normal Estimation on Point Clouds

Figure 2 for AdaFit: Rethinking Learning-based Normal Estimation on Point Clouds

Figure 3 for AdaFit: Rethinking Learning-based Normal Estimation on Point Clouds

Figure 4 for AdaFit: Rethinking Learning-based Normal Estimation on Point Clouds

Abstract:This paper presents a neural network for robust normal estimation on point clouds, named AdaFit, that can deal with point clouds with noise and density variations. Existing works use a network to learn point-wise weights for weighted least squares surface fitting to estimate the normals, which has difficulty in finding accurate normals in complex regions or containing noisy points. By analyzing the step of weighted least squares surface fitting, we find that it is hard to determine the polynomial order of the fitting surface and the fitting surface is sensitive to outliers. To address these problems, we propose a simple yet effective solution that adds an additional offset prediction to improve the quality of normal estimation. Furthermore, in order to take advantage of points from different neighborhood sizes, a novel Cascaded Scale Aggregation layer is proposed to help the network predict more accurate point-wise offsets and weights. Extensive experiments demonstrate that AdaFit achieves state-of-the-art performance on both the synthetic PCPNet dataset and the real-word SceneNN dataset.

* iccv2021 project page: https://runsong123.github.io/AdaFit/

Via

Access Paper or Ask Questions