Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jianghang Lin

S$^2$Teacher: Step-by-step Teacher for Sparsely Annotated Oriented Object Detection

Apr 15, 2025

Yu Lin, Jianghang Lin, Kai Ye, You Shen, Yan Zhang, Shengchuan Zhang, Liujuan Cao, Rongrong Ji

Abstract:Although fully-supervised oriented object detection has made significant progress in multimodal remote sensing image understanding, it comes at the cost of labor-intensive annotation. Recent studies have explored weakly and semi-supervised learning to alleviate this burden. However, these methods overlook the difficulties posed by dense annotations in complex remote sensing scenes. In this paper, we introduce a novel setting called sparsely annotated oriented object detection (SAOOD), which only labels partial instances, and propose a solution to address its challenges. Specifically, we focus on two key issues in the setting: (1) sparse labeling leading to overfitting on limited foreground representations, and (2) unlabeled objects (false negatives) confusing feature learning. To this end, we propose the S$^2$Teacher, a novel method that progressively mines pseudo-labels for unlabeled objects, from easy to hard, to enhance foreground representations. Additionally, it reweights the loss of unlabeled objects to mitigate their impact during training. Extensive experiments demonstrate that S$^2$Teacher not only significantly improves detector performance across different sparse annotation levels but also achieves near-fully-supervised performance on the DOTA dataset with only 10% annotation instances, effectively balancing detection accuracy with annotation efficiency. The code will be public.

Via

Access Paper or Ask Questions

AnomalyPainter: Vision-Language-Diffusion Synergy for Zero-Shot Realistic and Diverse Industrial Anomaly Synthesis

Mar 11, 2025

Zhangyu Lai, Yilin Lu, Xinyang Li, Jianghang Lin, Yansong Qu, Liujuan Cao, Ming Li, Rongrong Ji

Abstract:While existing anomaly synthesis methods have made remarkable progress, achieving both realism and diversity in synthesis remains a major obstacle. To address this, we propose AnomalyPainter, a zero-shot framework that breaks the diversity-realism trade-off dilemma through synergizing Vision Language Large Model (VLLM), Latent Diffusion Model (LDM), and our newly introduced texture library Tex-9K. Tex-9K is a professional texture library containing 75 categories and 8,792 texture assets crafted for diverse anomaly synthesis. Leveraging VLLM's general knowledge, reasonable anomaly text descriptions are generated for each industrial object and matched with relevant diverse textures from Tex-9K. These textures then guide the LDM via ControlNet to paint on normal images. Furthermore, we introduce Texture-Aware Latent Init to stabilize the natural-image-trained ControlNet for industrial images. Extensive experiments show that AnomalyPainter outperforms existing methods in realism, diversity, and generalization, achieving superior downstream performance.

* anomaly synthesis,anomaly detection

Via

Access Paper or Ask Questions

Exploring Semantic Consistency and Style Diversity for Domain Generalized Semantic Segmentation

Dec 16, 2024

Hongwei Niu, Linhuang Xie, Jianghang Lin, Shengchuan Zhang

Figure 1 for Exploring Semantic Consistency and Style Diversity for Domain Generalized Semantic Segmentation

Figure 2 for Exploring Semantic Consistency and Style Diversity for Domain Generalized Semantic Segmentation

Figure 3 for Exploring Semantic Consistency and Style Diversity for Domain Generalized Semantic Segmentation

Figure 4 for Exploring Semantic Consistency and Style Diversity for Domain Generalized Semantic Segmentation

Abstract:Domain Generalized Semantic Segmentation (DGSS) seeks to utilize source domain data exclusively to enhance the generalization of semantic segmentation across unknown target domains. Prevailing studies predominantly concentrate on feature normalization and domain randomization, these approaches exhibit significant limitations. Feature normalization-based methods tend to confuse semantic features in the process of constraining the feature space distribution, resulting in classification misjudgment. Domain randomization-based methods frequently incorporate domain-irrelevant noise due to the uncontrollability of style transformations, resulting in segmentation ambiguity. To address these challenges, we introduce a novel framework, named SCSD for Semantic Consistency prediction and Style Diversity generalization. It comprises three pivotal components: Firstly, a Semantic Query Booster is designed to enhance the semantic awareness and discrimination capabilities of object queries in the mask decoder, enabling cross-domain semantic consistency prediction. Secondly, we develop a Text-Driven Style Transform module that utilizes domain difference text embeddings to controllably guide the style transformation of image features, thereby increasing inter-domain style diversity. Lastly, to prevent the collapse of similar domain feature spaces, we introduce a Style Synergy Optimization mechanism that fortifies the separation of inter-domain features and the aggregation of intra-domain features by synergistically weighting style contrastive loss and style aggregation loss. Extensive experiments demonstrate that the proposed SCSD significantly outperforms existing state-of-theart methods. Notably, SCSD trained on GTAV achieved an average of 49.11 mIoU on the four unseen domain datasets, surpassing the previous state-of-the-art method by +4.08 mIoU. Code is available at https://github.com/nhw649/SCSD.

* Accepted by AAAI 2025

Via

Access Paper or Ask Questions

EOV-Seg: Efficient Open-Vocabulary Panoptic Segmentation

Dec 11, 2024

Hongwei Niu, Jie Hu, Jianghang Lin, Shengchuan Zhang

Figure 1 for EOV-Seg: Efficient Open-Vocabulary Panoptic Segmentation

Figure 2 for EOV-Seg: Efficient Open-Vocabulary Panoptic Segmentation

Figure 3 for EOV-Seg: Efficient Open-Vocabulary Panoptic Segmentation

Figure 4 for EOV-Seg: Efficient Open-Vocabulary Panoptic Segmentation

Abstract:Open-vocabulary panoptic segmentation aims to segment and classify everything in diverse scenes across an unbounded vocabulary. Existing methods typically employ two-stage or single-stage framework. The two-stage framework involves cropping the image multiple times using masks generated by a mask generator, followed by feature extraction, while the single-stage framework relies on a heavyweight mask decoder to make up for the lack of spatial position information through self-attention and cross-attention in multiple stacked Transformer blocks. Both methods incur substantial computational overhead, thereby hindering the efficiency of model inference. To fill the gap in efficiency, we propose EOV-Seg, a novel single-stage, shared, efficient, and spatial-aware framework designed for open-vocabulary panoptic segmentation. Specifically, EOV-Seg innovates in two aspects. First, a Vocabulary-Aware Selection (VAS) module is proposed to improve the semantic comprehension of visual aggregated features and alleviate the feature interaction burden on the mask decoder. Second, we introduce a Two-way Dynamic Embedding Experts (TDEE), which efficiently utilizes the spatial awareness capabilities of ViT-based CLIP backbone. To the best of our knowledge, EOV-Seg is the first open-vocabulary panoptic segmentation framework towards efficiency, which runs faster and achieves competitive performance compared with state-of-the-art methods. Specifically, with COCO training only, EOV-Seg achieves 24.2 PQ, 31.6 mIoU, and 12.7 FPS on the ADE20K dataset for panoptic and semantic segmentation tasks and the inference time of EOV-Seg is 4-21 times faster than state-of-the-art methods. Especially, equipped with ResNet-50 backbone, EOV-Seg runs 25 FPS with only 71M parameters on a single RTX 3090 GPU. Code is available at \url{https://github.com/nhw649/EOV-Seg}.

* Accepted by AAAI 2025

Via

Access Paper or Ask Questions

HUWSOD: Holistic Self-training for Unified Weakly Supervised Object Detection

Jun 27, 2024

Liujuan Cao, Jianghang Lin, Zebo Hong, Yunhang Shen, Shaohui Lin, Chao Chen, Rongrong Ji

Abstract:Most WSOD methods rely on traditional object proposals to generate candidate regions and are confronted with unstable training, which easily gets stuck in a poor local optimum. In this paper, we introduce a unified, high-capacity weakly supervised object detection (WSOD) network called HUWSOD, which utilizes a comprehensive self-training framework without needing external modules or additional supervision. HUWSOD innovatively incorporates a self-supervised proposal generator and an autoencoder proposal generator with a multi-rate resampling pyramid to replace traditional object proposals, enabling end-to-end WSOD training and inference. Additionally, we implement a holistic self-training scheme that refines detection scores and coordinates through step-wise entropy minimization and consistency-constraint regularization, ensuring consistent predictions across stochastic augmentations of the same image. Extensive experiments on PASCAL VOC and MS COCO demonstrate that HUWSOD competes with state-of-the-art WSOD methods, eliminating the need for offline proposals and additional data. The peak performance of HUWSOD approaches that of fully-supervised Faster R-CNN. Our findings also indicate that randomly initialized boxes, although significantly different from well-designed offline object proposals, are effective for WSOD training.

Via

Access Paper or Ask Questions

Depth-Guided Semi-Supervised Instance Segmentation

Jun 25, 2024

Xin Chen, Jie Hu, Xiawu Zheng, Jianghang Lin, Liujuan Cao, Rongrong Ji

Figure 1 for Depth-Guided Semi-Supervised Instance Segmentation

Figure 2 for Depth-Guided Semi-Supervised Instance Segmentation

Figure 3 for Depth-Guided Semi-Supervised Instance Segmentation

Figure 4 for Depth-Guided Semi-Supervised Instance Segmentation

Abstract:Semi-Supervised Instance Segmentation (SSIS) aims to leverage an amount of unlabeled data during training. Previous frameworks primarily utilized the RGB information of unlabeled images to generate pseudo-labels. However, such a mechanism often introduces unstable noise, as a single instance can display multiple RGB values. To overcome this limitation, we introduce a Depth-Guided (DG) SSIS framework. This framework uses depth maps extracted from input images, which represent individual instances with closely associated distance values, offering precise contours for distinct instances. Unlike RGB data, depth maps provide a unique perspective, making their integration into the SSIS process complex. To this end, we propose Depth Feature Fusion, which integrates features extracted from depth estimation. This integration allows the model to understand depth information better and ensure its effective utilization. Additionally, to manage the variability of depth images during training, we introduce the Depth Controller. This component enables adaptive adjustments of the depth map, enhancing convergence speed and dynamically balancing the loss weights between RGB and depth maps. Extensive experiments conducted on the COCO and Cityscapes datasets validate the efficacy of our proposed method. Our approach establishes a new benchmark for SSIS, outperforming previous methods. Specifically, our DG achieves 22.29%, 31.47%, and 35.14% mAP for 1%, 5%, and 10% labeled data on the COCO dataset, respectively.

* 12 pages, 6 figures, 4 tables

Via

Access Paper or Ask Questions

GOI: Find 3D Gaussians of Interest with an Optimizable Open-vocabulary Semantic-space Hyperplane

May 27, 2024

Yansong Qu, Shaohui Dai, Xinyang Li, Jianghang Lin, Liujuan Cao, Shengchuan Zhang, Rongrong Ji

Figure 1 for GOI: Find 3D Gaussians of Interest with an Optimizable Open-vocabulary Semantic-space Hyperplane

Figure 2 for GOI: Find 3D Gaussians of Interest with an Optimizable Open-vocabulary Semantic-space Hyperplane

Figure 3 for GOI: Find 3D Gaussians of Interest with an Optimizable Open-vocabulary Semantic-space Hyperplane

Figure 4 for GOI: Find 3D Gaussians of Interest with an Optimizable Open-vocabulary Semantic-space Hyperplane

Abstract:3D open-vocabulary scene understanding, crucial for advancing augmented reality and robotic applications, involves interpreting and locating specific regions within a 3D space as directed by natural language instructions. To this end, we introduce GOI, a framework that integrates semantic features from 2D vision-language foundation models into 3D Gaussian Splatting (3DGS) and identifies 3D Gaussians of Interest using an Optimizable Semantic-space Hyperplane. Our approach includes an efficient compression method that utilizes scene priors to condense noisy high-dimensional semantic features into compact low-dimensional vectors, which are subsequently embedded in 3DGS. During the open-vocabulary querying process, we adopt a distinct approach compared to existing methods, which depend on a manually set fixed empirical threshold to select regions based on their semantic feature distance to the query text embedding. This traditional approach often lacks universal accuracy, leading to challenges in precisely identifying specific target areas. Instead, our method treats the feature selection process as a hyperplane division within the feature space, retaining only those features that are highly relevant to the query. We leverage off-the-shelf 2D Referring Expression Segmentation (RES) models to fine-tune the semantic-space hyperplane, enabling a more precise distinction between target regions and others. This fine-tuning substantially improves the accuracy of open-vocabulary queries, ensuring the precise localization of pertinent 3D Gaussians. Extensive experiments demonstrate GOI's superiority over previous state-of-the-art methods. Our project page is available at https://goi-hyperplane.github.io/ .

* Our project page is available at https://goi-hyperplane.github.io/

Via

Access Paper or Ask Questions

Weakly Supervised Open-Vocabulary Object Detection

Dec 19, 2023

Jianghang Lin, Yunhang Shen, Bingquan Wang, Shaohui Lin, Ke Li, Liujuan Cao

Figure 1 for Weakly Supervised Open-Vocabulary Object Detection

Figure 2 for Weakly Supervised Open-Vocabulary Object Detection

Figure 3 for Weakly Supervised Open-Vocabulary Object Detection

Figure 4 for Weakly Supervised Open-Vocabulary Object Detection

Abstract:Despite weakly supervised object detection (WSOD) being a promising step toward evading strong instance-level annotations, its capability is confined to closed-set categories within a single training dataset. In this paper, we propose a novel weakly supervised open-vocabulary object detection framework, namely WSOVOD, to extend traditional WSOD to detect novel concepts and utilize diverse datasets with only image-level annotations. To achieve this, we explore three vital strategies, including dataset-level feature adaptation, image-level salient object localization, and region-level vision-language alignment. First, we perform data-aware feature extraction to produce an input-conditional coefficient, which is leveraged into dataset attribute prototypes to identify dataset bias and help achieve cross-dataset generalization. Second, a customized location-oriented weakly supervised region proposal network is proposed to utilize high-level semantic layouts from the category-agnostic segment anything model to distinguish object boundaries. Lastly, we introduce a proposal-concept synchronized multiple-instance network, i.e., object mining and refinement with visual-semantic alignment, to discover objects matched to the text embeddings of concepts. Extensive experiments on Pascal VOC and MS COCO demonstrate that the proposed WSOVOD achieves new state-of-the-art compared with previous WSOD methods in both close-set object localization and detection tasks. Meanwhile, WSOVOD enables cross-dataset and open-vocabulary learning to achieve on-par or even better performance than well-established fully-supervised open-vocabulary object detection (FSOVOD).

* Accepted by AAAI2024

Via

Access Paper or Ask Questions

Active Teacher for Semi-Supervised Object Detection

Mar 15, 2023

Peng Mi, Jianghang Lin, Yiyi Zhou, Yunhang Shen, Gen Luo, Xiaoshuai Sun, Liujuan Cao, Rongrong Fu, Qiang Xu, Rongrong Ji

Figure 1 for Active Teacher for Semi-Supervised Object Detection

Figure 2 for Active Teacher for Semi-Supervised Object Detection

Figure 3 for Active Teacher for Semi-Supervised Object Detection

Figure 4 for Active Teacher for Semi-Supervised Object Detection

Abstract:In this paper, we study teacher-student learning from the perspective of data initialization and propose a novel algorithm called Active Teacher(Source code are available at: \url{https://github.com/HunterJ-Lin/ActiveTeacher}) for semi-supervised object detection (SSOD). Active Teacher extends the teacher-student framework to an iterative version, where the label set is partially initialized and gradually augmented by evaluating three key factors of unlabeled examples, including difficulty, information and diversity. With this design, Active Teacher can maximize the effect of limited label information while improving the quality of pseudo-labels. To validate our approach, we conduct extensive experiments on the MS-COCO benchmark and compare Active Teacher with a set of recently proposed SSOD methods. The experimental results not only validate the superior performance gain of Active Teacher over the compared methods, but also show that it enables the baseline network, ie, Faster-RCNN, to achieve 100% supervised performance with much less label expenditure, ie 40% labeled examples on MS-COCO. More importantly, we believe that the experimental analyses in this paper can provide useful empirical knowledge for data annotation in practical applications.

* 8 pages, 7 figures, CVPR2022

Via

Access Paper or Ask Questions