Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ye Zheng

ZeroBP: Learning Position-Aware Correspondence for Zero-shot 6D Pose Estimation in Bin-Picking

Feb 03, 2025

Jianqiu Chen, Zikun Zhou, Xin Li, Ye Zheng, Tianpeng Bao, Zhenyu He

Abstract:Bin-picking is a practical and challenging robotic manipulation task, where accurate 6D pose estimation plays a pivotal role. The workpieces in bin-picking are typically textureless and randomly stacked in a bin, which poses a significant challenge to 6D pose estimation. Existing solutions are typically learning-based methods, which require object-specific training. Their efficiency of practical deployment for novel workpieces is highly limited by data collection and model retraining. Zero-shot 6D pose estimation is a potential approach to address the issue of deployment efficiency. Nevertheless, existing zero-shot 6D pose estimation methods are designed to leverage feature matching to establish point-to-point correspondences for pose estimation, which is less effective for workpieces with textureless appearances and ambiguous local regions. In this paper, we propose ZeroBP, a zero-shot pose estimation framework designed specifically for the bin-picking task. ZeroBP learns Position-Aware Correspondence (PAC) between the scene instance and its CAD model, leveraging both local features and global positions to resolve the mismatch issue caused by ambiguous regions with similar shapes and appearances. Extensive experiments on the ROBI dataset demonstrate that ZeroBP outperforms state-of-the-art zero-shot pose estimation methods, achieving an improvement of 9.1% in average recall of correct poses.

* ICRA 2025

Via

Access Paper or Ask Questions

Global-Local MAV Detection under Challenging Conditions based on Appearance and Motion

Dec 18, 2023

Hanqing Guo, Ye Zheng, Yin Zhang, Zhi Gao, Shiyu Zhao

Figure 1 for Global-Local MAV Detection under Challenging Conditions based on Appearance and Motion

Figure 2 for Global-Local MAV Detection under Challenging Conditions based on Appearance and Motion

Figure 3 for Global-Local MAV Detection under Challenging Conditions based on Appearance and Motion

Figure 4 for Global-Local MAV Detection under Challenging Conditions based on Appearance and Motion

Abstract:Visual detection of micro aerial vehicles (MAVs) has received increasing research attention in recent years due to its importance in many applications. However, the existing approaches based on either appearance or motion features of MAVs still face challenges when the background is complex, the MAV target is small, or the computation resource is limited. In this paper, we propose a global-local MAV detector that can fuse both motion and appearance features for MAV detection under challenging conditions. This detector first searches MAV target using a global detector and then switches to a local detector which works in an adaptive search region to enhance accuracy and efficiency. Additionally, a detector switcher is applied to coordinate the global and local detectors. A new dataset is created to train and verify the effectiveness of the proposed detector. This dataset contains more challenging scenarios that can occur in practice. Extensive experiments on three challenging datasets show that the proposed detector outperforms the state-of-the-art ones in terms of detection accuracy and computational efficiency. In particular, this detector can run with near real-time frame rate on NVIDIA Jetson NX Xavier, which demonstrates the usefulness of our approach for real-world applications. The dataset is available at https://github.com/WestlakeIntelligentRobotics/GLAD. In addition, A video summarizing this work is available at https://youtu.be/Tv473mAzHbU.

* 12 pages, 10 figures

Via

Access Paper or Ask Questions

MIAD: A Maintenance Inspection Dataset for Unsupervised Anomaly Detection

Nov 28, 2022

Tianpeng Bao, Jiadong Chen, Wei Li, Xiang Wang, Jingjing Fei, Liwei Wu, Rui Zhao, Ye Zheng

Figure 1 for MIAD: A Maintenance Inspection Dataset for Unsupervised Anomaly Detection

Figure 2 for MIAD: A Maintenance Inspection Dataset for Unsupervised Anomaly Detection

Figure 3 for MIAD: A Maintenance Inspection Dataset for Unsupervised Anomaly Detection

Figure 4 for MIAD: A Maintenance Inspection Dataset for Unsupervised Anomaly Detection

Abstract:Visual anomaly detection plays a crucial role in not only manufacturing inspection to find defects of products during manufacturing processes, but also maintenance inspection to keep equipment in optimum working condition particularly outdoors. Due to the scarcity of the defective samples, unsupervised anomaly detection has attracted great attention in recent years. However, existing datasets for unsupervised anomaly detection are biased towards manufacturing inspection, not considering maintenance inspection which is usually conducted under outdoor uncontrolled environment such as varying camera viewpoints, messy background and degradation of object surface after long-term working. We focus on outdoor maintenance inspection and contribute a comprehensive Maintenance Inspection Anomaly Detection (MIAD) dataset which contains more than 100K high-resolution color images in various outdoor industrial scenarios. This dataset is generated by a 3D graphics software and covers both surface and logical anomalies with pixel-precise ground truth. Extensive evaluations of representative algorithms for unsupervised anomaly detection are conducted, and we expect MIAD and corresponding experimental results can inspire research community in outdoor unsupervised anomaly detection tasks. Worthwhile and related future work can be spawned from our new dataset.

Via

Access Paper or Ask Questions

Uni6Dv2: Noise Elimination for 6D Pose Estimation

Aug 15, 2022

Mingshan Sun, Ye Zheng, Tianpeng Bao, Jianqiu Chen, Guoqiang Jin, Liwei Wu, Rui Zhao, Xiaoke Jiang

Figure 1 for Uni6Dv2: Noise Elimination for 6D Pose Estimation

Figure 2 for Uni6Dv2: Noise Elimination for 6D Pose Estimation

Figure 3 for Uni6Dv2: Noise Elimination for 6D Pose Estimation

Figure 4 for Uni6Dv2: Noise Elimination for 6D Pose Estimation

Abstract:Few prior 6D pose estimation methods use a backbone network to extract features from RGB and depth images, and Uni6D is the pioneer to do so. We find that primary causes of the performance limitation in Uni6D are Instance-Outside and Instance-Inside noise. Uni6D inevitably introduces Instance-Outside noise from background pixels in the receptive field due to its inherently straightforward pipeline design and ignores the Instance-Inside noise in the input depth data. In this work, we propose a two-step denoising method to handle aforementioned noise in Uni6D. In the first step, an instance segmentation network is used to crop and mask the instance to remove noise from non-instance regions. In the second step, a lightweight depth denoising module is proposed to calibrate the depth feature before feeding it into the pose regression network. Extensive experiments show that our method called Uni6Dv2 is able to eliminate the noise effectively and robustly, outperforming Uni6D without sacrificing too much inference efficiency. It also reduces the need for annotated real data that requires costly labeling.

Via

Access Paper or Ask Questions

Benchmarking Unsupervised Anomaly Detection and Localization

May 30, 2022

Ye Zheng, Xiang Wang, Yu Qi, Wei Li, Liwei Wu

Figure 1 for Benchmarking Unsupervised Anomaly Detection and Localization

Figure 2 for Benchmarking Unsupervised Anomaly Detection and Localization

Figure 3 for Benchmarking Unsupervised Anomaly Detection and Localization

Figure 4 for Benchmarking Unsupervised Anomaly Detection and Localization

Abstract:Unsupervised anomaly detection and localization, as of one the most practical and challenging problems in computer vision, has received great attention in recent years. From the time the MVTec AD dataset was proposed to the present, new research methods that are constantly being proposed push its precision to saturation. It is the time to conduct a comprehensive comparison of existing methods to inspire further research. This paper extensively compares 13 papers in terms of the performance in unsupervised anomaly detection and localization tasks, and adds a comparison of inference efficiency previously ignored by the community. Meanwhile, analysis of the MVTec AD dataset are also given, especially the label ambiguity that affects the model fails to achieve full marks. Moreover, considering the proposal of the new MVTec 3D-AD dataset, this paper also conducts experiments using the existing state-of-the-art 2D methods on this new dataset, and reports the corresponding results with analysis.

Via

Access Paper or Ask Questions

Uni6D: A Unified CNN Framework without Projection Breakdown for 6D Pose Estimation

Apr 05, 2022

Xiaoke Jiang, Donghai Li, Hao Chen, Ye Zheng, Rui Zhao, Liwei Wu

Figure 1 for Uni6D: A Unified CNN Framework without Projection Breakdown for 6D Pose Estimation

Figure 2 for Uni6D: A Unified CNN Framework without Projection Breakdown for 6D Pose Estimation

Figure 3 for Uni6D: A Unified CNN Framework without Projection Breakdown for 6D Pose Estimation

Figure 4 for Uni6D: A Unified CNN Framework without Projection Breakdown for 6D Pose Estimation

Abstract:As RGB-D sensors become more affordable, using RGB-D images to obtain high-accuracy 6D pose estimation results becomes a better option. State-of-the-art approaches typically use different backbones to extract features for RGB and depth images. They use a 2D CNN for RGB images and a per-pixel point cloud network for depth data, as well as a fusion network for feature fusion. We find that the essential reason for using two independent backbones is the "projection breakdown" problem. In the depth image plane, the projected 3D structure of the physical world is preserved by the 1D depth value and its built-in 2D pixel coordinate (UV). Any spatial transformation that modifies UV, such as resize, flip, crop, or pooling operations in the CNN pipeline, breaks the binding between the pixel value and UV coordinate. As a consequence, the 3D structure is no longer preserved by a modified depth image or feature. To address this issue, we propose a simple yet effective method denoted as Uni6D that explicitly takes the extra UV data along with RGB-D images as input. Our method has a Unified CNN framework for 6D pose estimation with a single CNN backbone. In particular, the architecture of our method is based on Mask R-CNN with two extra heads, one named RT head for directly predicting 6D pose and the other named abc head for guiding the network to map the visible points to their coordinates in the 3D model as an auxiliary module. This end-to-end approach balances simplicity and accuracy, achieving comparable accuracy with state of the arts and 7.2x faster inference speed on the YCB-Video dataset.

* CVPR2022

Via

Access Paper or Ask Questions

FastFlow: Unsupervised Anomaly Detection and Localization via 2D Normalizing Flows

Nov 16, 2021

Jiawei Yu, Ye Zheng, Xiang Wang, Wei Li, Yushuang Wu, Rui Zhao, Liwei Wu

Figure 1 for FastFlow: Unsupervised Anomaly Detection and Localization via 2D Normalizing Flows

Figure 2 for FastFlow: Unsupervised Anomaly Detection and Localization via 2D Normalizing Flows

Figure 3 for FastFlow: Unsupervised Anomaly Detection and Localization via 2D Normalizing Flows

Figure 4 for FastFlow: Unsupervised Anomaly Detection and Localization via 2D Normalizing Flows

Abstract:Unsupervised anomaly detection and localization is crucial to the practical application when collecting and labeling sufficient anomaly data is infeasible. Most existing representation-based approaches extract normal image features with a deep convolutional neural network and characterize the corresponding distribution through non-parametric distribution estimation methods. The anomaly score is calculated by measuring the distance between the feature of the test image and the estimated distribution. However, current methods can not effectively map image features to a tractable base distribution and ignore the relationship between local and global features which are important to identify anomalies. To this end, we propose FastFlow implemented with 2D normalizing flows and use it as the probability distribution estimator. Our FastFlow can be used as a plug-in module with arbitrary deep feature extractors such as ResNet and vision transformer for unsupervised anomaly detection and localization. In training phase, FastFlow learns to transform the input visual feature into a tractable distribution and obtains the likelihood to recognize anomalies in inference phase. Extensive experimental results on the MVTec AD dataset show that FastFlow surpasses previous state-of-the-art methods in terms of accuracy and inference efficiency with various backbone networks. Our approach achieves 99.4% AUC in anomaly detection with high inference efficiency.

* 11 pages,8 figures

Via

Access Paper or Ask Questions

Focus Your Distribution: Coarse-to-Fine Non-Contrastive Learning for Anomaly Detection and Localization

Oct 09, 2021

Ye Zheng, Xiang Wang, Rui Deng, Tianpeng Bao, Rui Zhao, Liwei Wu

Figure 1 for Focus Your Distribution: Coarse-to-Fine Non-Contrastive Learning for Anomaly Detection and Localization

Figure 2 for Focus Your Distribution: Coarse-to-Fine Non-Contrastive Learning for Anomaly Detection and Localization

Figure 3 for Focus Your Distribution: Coarse-to-Fine Non-Contrastive Learning for Anomaly Detection and Localization

Figure 4 for Focus Your Distribution: Coarse-to-Fine Non-Contrastive Learning for Anomaly Detection and Localization

Abstract:The essence of unsupervised anomaly detection is to learn the compact distribution of normal samples and detect outliers as anomalies in testing. Meanwhile, the anomalies in real-world are usually subtle and fine-grained in a high-resolution image especially for industrial applications. Towards this end, we propose a novel framework for unsupervised anomaly detection and localization. Our method aims at learning dense and compact distribution from normal images with a coarse-to-fine alignment process. The coarse alignment stage standardizes the pixel-wise position of objects in both image and feature levels. The fine alignment stage then densely maximizes the similarity of features among all corresponding locations in a batch. To facilitate the learning with only normal images, we propose a new pretext task called non-contrastive learning for the fine alignment stage. Non-contrastive learning extracts robust and discriminating normal image representations without making assumptions on abnormal samples, and it thus empowers our model to generalize to various anomalous scenarios. Extensive experiments on two typical industrial datasets of MVTec AD and BenTech AD demonstrate that our framework is effective in detecting various real-world defects and achieves a new state-of-the-art in industrial unsupervised anomaly detection.

Via

Access Paper or Ask Questions

Zero-Shot Instance Segmentation

Apr 14, 2021

Ye Zheng, Jiahong Wu, Yongqiang Qin, Faen Zhang, Li Cui

Figure 1 for Zero-Shot Instance Segmentation

Figure 2 for Zero-Shot Instance Segmentation

Figure 3 for Zero-Shot Instance Segmentation

Figure 4 for Zero-Shot Instance Segmentation

Abstract:Deep learning has significantly improved the precision of instance segmentation with abundant labeled data. However, in many areas like medical and manufacturing, collecting sufficient data is extremely hard and labeling this data requires high professional skills. We follow this motivation and propose a new task set named zero-shot instance segmentation (ZSI). In the training phase of ZSI, the model is trained with seen data, while in the testing phase, it is used to segment all seen and unseen instances. We first formulate the ZSI task and propose a method to tackle the challenge, which consists of Zero-shot Detector, Semantic Mask Head, Background Aware RPN and Synchronized Background Strategy. We present a new benchmark for zero-shot instance segmentation based on the MS-COCO dataset. The extensive empirical results in this benchmark show that our method not only surpasses the state-of-the-art results in zero-shot object detection task but also achieves promising performance on ZSI. Our approach will serve as a solid baseline and facilitate future research in zero-shot instance segmentation.

* 8 pages, 6 figures

Via

Access Paper or Ask Questions

Background Learnable Cascade for Zero-Shot Object Detection

Oct 09, 2020

Ye Zheng, Ruoran Huang, Chuanqi Han, Xi Huang, Li Cui

Figure 1 for Background Learnable Cascade for Zero-Shot Object Detection

Figure 2 for Background Learnable Cascade for Zero-Shot Object Detection

Figure 3 for Background Learnable Cascade for Zero-Shot Object Detection

Figure 4 for Background Learnable Cascade for Zero-Shot Object Detection

Abstract:Zero-shot detection (ZSD) is crucial to large-scale object detection with the aim of simultaneously localizing and recognizing unseen objects. There remain several challenges for ZSD, including reducing the ambiguity between background and unseen objects as well as improving the alignment between visual and semantic concept. In this work, we propose a novel framework named Background Learnable Cascade (BLC) to improve ZSD performance. The major contributions for BLC are as follows: (i) we propose a multi-stage cascade structure named Cascade Semantic R-CNN to progressively refine the alignment between visual and semantic of ZSD; (ii) we develop the semantic information flow structure and directly add it between each stage in Cascade Semantic RCNN to further improve the semantic feature learning; (iii) we propose the background learnable region proposal network (BLRPN) to learn an appropriate word vector for background class and use this learned vector in Cascade Semantic R CNN, this design makes \Background Learnable" and reduces the confusion between background and unseen classes. Our extensive experiments show BLC obtains significantly performance improvements for MS-COCO over state-of-the-art methods.

* 18 pages, 5figures

Via

Access Paper or Ask Questions