Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Can Gao

Fence Theorem: Towards Dual-Objective Semantic-Structure Isolation in Preprocessing Phase for 3D Anomaly Detection

Mar 04, 2025

Hanzhe Liang, Jie Zhou, Xuanxin Chen, Tao Dai, Jinbao Wang, Can Gao

Abstract:3D anomaly detection (AD) is prominent but difficult due to lacking a unified theoretical foundation for preprocessing design. We establish the Fence Theorem, formalizing preprocessing as a dual-objective semantic isolator: (1) mitigating cross-semantic interference to the greatest extent feasible and (2) confining anomaly judgments to aligned semantic spaces wherever viable, thereby establishing intra-semantic comparability. Any preprocessing approach achieves this goal through a two-stage process of Emantic-Division and Spatial-Constraints stage. Through systematic deconstruction, we theoretically and experimentally subsume existing preprocessing methods under this theorem via tripartite evidence: qualitative analyses, quantitative studies, and mathematical proofs. Guided by the Fence Theorem, we implement Patch3D, consisting of Patch-Cutting and Patch-Matching modules, to segment semantic spaces and consolidate similar ones while independently modeling normal features within each space. Experiments on Anomaly-ShapeNet and Real3D-AD with different settings demonstrate that progressively finer-grained semantic alignment in preprocessing directly enhances point-level AD accuracy, providing inverse validation of the theorem's causal logic.

Via

Access Paper or Ask Questions

Learning with Open-world Noisy Data via Class-independent Margin in Dual Representation Space

Jan 19, 2025

Linchao Pan, Can Gao, Jie Zhou, Jinbao Wang

Figure 1 for Learning with Open-world Noisy Data via Class-independent Margin in Dual Representation Space

Figure 2 for Learning with Open-world Noisy Data via Class-independent Margin in Dual Representation Space

Figure 3 for Learning with Open-world Noisy Data via Class-independent Margin in Dual Representation Space

Figure 4 for Learning with Open-world Noisy Data via Class-independent Margin in Dual Representation Space

Abstract:Learning with Noisy Labels (LNL) aims to improve the model generalization when facing data with noisy labels, and existing methods generally assume that noisy labels come from known classes, called closed-set noise. However, in real-world scenarios, noisy labels from similar unknown classes, i.e., open-set noise, may occur during the training and inference stage. Such open-world noisy labels may significantly impact the performance of LNL methods. In this study, we propose a novel dual-space joint learning method to robustly handle the open-world noise. To mitigate model overfitting on closed-set and open-set noises, a dual representation space is constructed by two networks. One is a projection network that learns shared representations in the prototype space, while the other is a One-Vs-All (OVA) network that makes predictions using unique semantic representations in the class-independent space. Then, bi-level contrastive learning and consistency regularization are introduced in two spaces to enhance the detection capability for data with unknown classes. To benefit from the memorization effects across different types of samples, class-independent margin criteria are designed for sample identification, which selects clean samples, weights closed-set noise, and filters open-set noise effectively. Extensive experiments demonstrate that our method outperforms the state-of-the-art methods and achieves an average accuracy improvement of 4.55\% and an AUROC improvement of 6.17\% on CIFAR80N.

* 7 pages of main text, 4 pages of appendix, accepted to AAAI 2025

Via

Access Paper or Ask Questions

Fuzzy Granule Density-Based Outlier Detection with Multi-Scale Granular Balls

Jan 06, 2025

Can Gao, Xiaofeng Tan, Jie Zhou, Weiping Ding, Witold Pedrycz

Figure 1 for Fuzzy Granule Density-Based Outlier Detection with Multi-Scale Granular Balls

Figure 2 for Fuzzy Granule Density-Based Outlier Detection with Multi-Scale Granular Balls

Figure 3 for Fuzzy Granule Density-Based Outlier Detection with Multi-Scale Granular Balls

Figure 4 for Fuzzy Granule Density-Based Outlier Detection with Multi-Scale Granular Balls

Abstract:Outlier detection refers to the identification of anomalous samples that deviate significantly from the distribution of normal data and has been extensively studied and used in a variety of practical tasks. However, most unsupervised outlier detection methods are carefully designed to detect specified outliers, while real-world data may be entangled with different types of outliers. In this study, we propose a fuzzy rough sets-based multi-scale outlier detection method to identify various types of outliers. Specifically, a novel fuzzy rough sets-based method that integrates relative fuzzy granule density is first introduced to improve the capability of detecting local outliers. Then, a multi-scale view generation method based on granular-ball computing is proposed to collaboratively identify group outliers at different levels of granularity. Moreover, reliable outliers and inliers determined by the three-way decision are used to train a weighted support vector machine to further improve the performance of outlier detection. The proposed method innovatively transforms unsupervised outlier detection into a semi-supervised classification problem and for the first time explores the fuzzy rough sets-based outlier detection from the perspective of multi-scale granular balls, allowing for high adaptability to different types of outliers. Extensive experiments carried out on both artificial and UCI datasets demonstrate that the proposed outlier detection method significantly outperforms the state-of-the-art methods, improving the results by at least 8.48% in terms of the Area Under the ROC Curve (AUROC) index. { The source codes are released at \url{https://github.com/Xiaofeng-Tan/MGBOD}. }

* IEEE Transactions on Knowledge and Data Engineering, 2025

Via

Access Paper or Ask Questions

FlameGS: Reconstruct flame light field via Gaussian Splatting

Dec 24, 2024

Yunhao Shui, Fuhao Zhang, Can Gao, Hao Xue, Zhiyin Ma, Gang Xun, Xuesong Li

Figure 1 for FlameGS: Reconstruct flame light field via Gaussian Splatting

Figure 2 for FlameGS: Reconstruct flame light field via Gaussian Splatting

Figure 3 for FlameGS: Reconstruct flame light field via Gaussian Splatting

Figure 4 for FlameGS: Reconstruct flame light field via Gaussian Splatting

Abstract:To address the time-consuming and computationally intensive issues of traditional ART algorithms for flame combustion diagnosis, inspired by flame simulation technology, we propose a novel representation method for flames. By modeling the luminous process of flames and utilizing 2D projection images for supervision, our experimental validation shows that this model achieves an average structural similarity index of 0.96 between actual images and predicted 2D projections, along with a Peak Signal-to-Noise Ratio of 39.05. Additionally, it saves approximately 34 times the computation time and about 10 times the memory compared to traditional algorithms.

Via

Access Paper or Ask Questions

Look Inside for More: Internal Spatial Modality Perception for 3D Anomaly Detection

Dec 18, 2024

Hanzhe Liang, Guoyang Xie, Chengbin Hou, Bingshu Wang, Can Gao, Jinbao Wang

Figure 1 for Look Inside for More: Internal Spatial Modality Perception for 3D Anomaly Detection

Figure 2 for Look Inside for More: Internal Spatial Modality Perception for 3D Anomaly Detection

Figure 3 for Look Inside for More: Internal Spatial Modality Perception for 3D Anomaly Detection

Figure 4 for Look Inside for More: Internal Spatial Modality Perception for 3D Anomaly Detection

Abstract:3D anomaly detection has recently become a significant focus in computer vision. Several advanced methods have achieved satisfying anomaly detection performance. However, they typically concentrate on the external structure of 3D samples and struggle to leverage the internal information embedded within samples. Inspired by the basic intuition of why not look inside for more, we introduce a straightforward method named Internal Spatial Modality Perception (ISMP) to explore the feature representation from internal views fully. Specifically, our proposed ISMP consists of a critical perception module, Spatial Insight Engine (SIE), which abstracts complex internal information of point clouds into essential global features. Besides, to better align structural information with point data, we propose an enhanced key point feature extraction module for amplifying spatial structure feature representation. Simultaneously, a novel feature filtering module is incorporated to reduce noise and redundant features for further aligning precise spatial structure. Extensive experiments validate the effectiveness of our proposed method, achieving object-level and pixel-level AUROC improvements of 4.2% and 13.1%, respectively, on the Real3D-AD benchmarks. Note that the strong generalization ability of SIE has been theoretically proven and is verified in both classification and segmentation tasks.

* AAAI2025 Accepted

Via

Access Paper or Ask Questions

Low-Light Enhancement Effect on Classification and Detection: An Empirical Study

Sep 22, 2024

Xu Wu, Zhihui Lai, Zhou Jie, Can Gao, Xianxu Hou, Ya-nan Zhang, Linlin Shen

Abstract:Low-light images are commonly encountered in real-world scenarios, and numerous low-light image enhancement (LLIE) methods have been proposed to improve the visibility of these images. The primary goal of LLIE is to generate clearer images that are more visually pleasing to humans. However, the impact of LLIE methods in high-level vision tasks, such as image classification and object detection, which rely on high-quality image datasets, is not well {explored}. To explore the impact, we comprehensively evaluate LLIE methods on these high-level vision tasks by utilizing an empirical investigation comprising image classification and object detection experiments. The evaluation reveals a dichotomy: {\textit{While Low-Light Image Enhancement (LLIE) methods enhance human visual interpretation, their effect on computer vision tasks is inconsistent and can sometimes be harmful. }} Our findings suggest a disconnect between image enhancement for human visual perception and for machine analysis, indicating a need for LLIE methods tailored to support high-level vision tasks effectively. This insight is crucial for the development of LLIE techniques that align with the needs of both human and machine vision.

* 8 pages,8 figures

Via

Access Paper or Ask Questions

Towards High-resolution 3D Anomaly Detection via Group-Level Feature Contrastive Learning

Aug 08, 2024

Hongze Zhu, Guoyang Xie, Chengbin Hou, Tao Dai, Can Gao, Jinbao Wang, Linlin Shen

Figure 1 for Towards High-resolution 3D Anomaly Detection via Group-Level Feature Contrastive Learning

Figure 2 for Towards High-resolution 3D Anomaly Detection via Group-Level Feature Contrastive Learning

Figure 3 for Towards High-resolution 3D Anomaly Detection via Group-Level Feature Contrastive Learning

Figure 4 for Towards High-resolution 3D Anomaly Detection via Group-Level Feature Contrastive Learning

Abstract:High-resolution point clouds~(HRPCD) anomaly detection~(AD) plays a critical role in precision machining and high-end equipment manufacturing. Despite considerable 3D-AD methods that have been proposed recently, they still cannot meet the requirements of the HRPCD-AD task. There are several challenges: i) It is difficult to directly capture HRPCD information due to large amounts of points at the sample level; ii) The advanced transformer-based methods usually obtain anisotropic features, leading to degradation of the representation; iii) The proportion of abnormal areas is very small, which makes it difficult to characterize. To address these challenges, we propose a novel group-level feature-based network, called Group3AD, which has a significantly efficient representation ability. First, we design an Intercluster Uniformity Network~(IUN) to present the mapping of different groups in the feature space as several clusters, and obtain a more uniform distribution between clusters representing different parts of the point clouds in the feature space. Then, an Intracluster Alignment Network~(IAN) is designed to encourage groups within the cluster to be distributed tightly in the feature space. In addition, we propose an Adaptive Group-Center Selection~(AGCS) based on geometric information to improve the pixel density of potential anomalous regions during inference. The experimental results verify the effectiveness of our proposed Group3AD, which surpasses Reg3D-AD by the margin of 5\% in terms of object-level AUROC on Real3D-AD. We provide the code and supplementary information on our website: https://github.com/M-3LAB/Group3AD.

* ACMMM24, 12 pages, 5 figures

Via

Access Paper or Ask Questions

UNIMO-3: Multi-granularity Interaction for Vision-Language Representation Learning

May 23, 2023

Hao Yang, Can Gao, Hao Líu, Xinyan Xiao, Yanyan Zhao, Bing Qin

Abstract:Vision-and-language (VL) pre-training, which aims to learn a general representation of image-text pairs that can be transferred to various vision-and-language tasks. Compared with modeling uni-modal data, the main challenge of the VL model is: how to learn the cross-modal interaction from multimodal data, especially the fine-grained interaction. Existing works have shown that fully transformer-based models that adopt attention mechanisms to learn in-layer cross-model interaction can demonstrate impressive performance on various cross-modal downstream tasks. However, they ignored that the semantic information of the different modals at the same layer was not uniform, which leads to the cross-modal interaction collapsing into a limited multi-modal semantic information interaction. In this work, we propose the UNIMO-3 model, which has the capacity to simultaneously learn the multimodal in-layer interaction and cross-layer interaction. UNIMO-3 model can establish effective connections between different layers in a cross-modal encoder, and adaptively capture the interaction between two modalities at different levels. The experimental results show that our model achieves state-of-the-art performance in various downstream tasks, and through ablation study can prove that effective cross-layer learning improves the model's ability of multimodal representation.

Via

Access Paper or Ask Questions

UNIMO-2: End-to-End Unified Vision-Language Grounded Learning

Mar 17, 2022

Wei Li, Can Gao, Guocheng Niu, Xinyan Xiao, Hao Liu, Jiachen Liu, Hua Wu, Haifeng Wang

Figure 1 for UNIMO-2: End-to-End Unified Vision-Language Grounded Learning

Figure 2 for UNIMO-2: End-to-End Unified Vision-Language Grounded Learning

Figure 3 for UNIMO-2: End-to-End Unified Vision-Language Grounded Learning

Figure 4 for UNIMO-2: End-to-End Unified Vision-Language Grounded Learning

Abstract:Vision-Language Pre-training (VLP) has achieved impressive performance on various cross-modal downstream tasks. However, most existing methods can only learn from aligned image-caption data and rely heavily on expensive regional features, which greatly limits their scalability and performance. In this paper, we propose an end-to-end unified-modal pre-training framework, namely UNIMO-2, for joint learning on both aligned image-caption data and unaligned image-only and text-only corpus. We build a unified Transformer model to jointly learn visual representations, textual representations and semantic alignment between images and texts. In particular, we propose to conduct grounded learning on both images and texts via a sharing grounded space, which helps bridge unaligned images and texts, and align the visual and textual semantic spaces on different types of corpora. The experiments show that our grounded learning method can improve textual and visual semantic alignment for improving performance on various cross-modal tasks. Moreover, benefiting from effective joint modeling of different types of corpora, our model also achieves impressive performance on single-modal visual and textual tasks. Our code and models are public at the UNIMO project page https://unimo-ptm.github.io/.

* Accepted by ACL2022

Via

Access Paper or Ask Questions

Granular conditional entropy-based attribute reduction for partially labeled data with proxy labels

Jan 23, 2021

Can Gao, Jie Zhoua, Duoqian Miao, Xiaodong Yue, Jun Wan

Figure 1 for Granular conditional entropy-based attribute reduction for partially labeled data with proxy labels

Figure 2 for Granular conditional entropy-based attribute reduction for partially labeled data with proxy labels

Figure 3 for Granular conditional entropy-based attribute reduction for partially labeled data with proxy labels

Figure 4 for Granular conditional entropy-based attribute reduction for partially labeled data with proxy labels

Abstract:Attribute reduction is one of the most important research topics in the theory of rough sets, and many rough sets-based attribute reduction methods have thus been presented. However, most of them are specifically designed for dealing with either labeled data or unlabeled data, while many real-world applications come in the form of partial supervision. In this paper, we propose a rough sets-based semi-supervised attribute reduction method for partially labeled data. Particularly, with the aid of prior class distribution information about data, we first develop a simple yet effective strategy to produce the proxy labels for unlabeled data. Then the concept of information granularity is integrated into the information-theoretic measure, based on which, a novel granular conditional entropy measure is proposed, and its monotonicity is proved in theory. Furthermore, a fast heuristic algorithm is provided to generate the optimal reduct of partially labeled data, which could accelerate the process of attribute reduction by removing irrelevant examples and excluding redundant attributes simultaneously. Extensive experiments conducted on UCI data sets demonstrate that the proposed semi-supervised attribute reduction method is promising and even compares favourably with the supervised methods on labeled data and unlabeled data with true labels in terms of classification performance.

* 22 pages, 5 figures, and 5 tables. Preprint submitted to Information Sciences

Via

Access Paper or Ask Questions