Abstract:With increasing revelations of academic fraud, detecting forged experimental images in the biomedical field has become a public concern. The challenge lies in the fact that copy-move targets can include background tissue, small foreground objects, or both, which may be out of the training domain and subject to unseen attacks, rendering standard object-detection-based approaches less effective. To address this, we reformulate the problem of detecting biomedical copy-move forgery regions as an intra-image co-saliency detection task and propose CMSeg-Net, a copy-move forgery segmentation network capable of identifying unseen duplicated areas. Built on a multi-resolution encoder-decoder architecture, CMSeg-Net incorporates self-correlation and correlation-assisted spatial-attention modules to detect intra-image regional similarities within feature tensors at each observation scale. This design helps distinguish even small copy-move targets in complex microscopic images from other similar objects. Furthermore, we created a copy-move forgery dataset of optical microscopic images, named FakeParaEgg, using open data from the ICIP 2022 Challenge to support CMSeg-Net's development and verify its performance. Extensive experiments demonstrate that our approach outperforms previous state-of-the-art methods on the FakeParaEgg dataset and other open copy-move detection datasets, including CASIA-CMFD, CoMoFoD, and CMF. The FakeParaEgg dataset, our source code, and the CMF dataset with our manually defined segmentation ground truths available at ``https://github.com/YoursEver/FakeParaEgg''.
Abstract:Recent advances in VLSI fabrication technology have led to die shrinkage and increased layout density, creating an urgent demand for advanced hotspot detection techniques. However, by taking an object detection network as the backbone, recent learning-based hotspot detectors learn to recognize only the problematic layout patterns in the training data. This fact makes these hotspot detectors difficult to generalize to real-world scenarios. We propose a novel lithography simulator-powered hotspot detection framework to overcome this difficulty. Our framework integrates a lithography simulator with an object detection backbone, merging the extracted latent features from both the simulator and the object detector via well-designed cross-attention blocks. Consequently, the proposed framework can be used to detect potential hotspot regions based on I) the variation of possible circuit shape deformation estimated by the lithography simulator, and ii) the problematic layout patterns already known. To this end, we utilize RetinaNet with a feature pyramid network as the object detection backbone and leverage LithoNet as the lithography simulator. Extensive experiments demonstrate that our proposed simulator-guided hotspot detection framework outperforms previous state-of-the-art methods on real-world data.
Abstract:Learning-based pre-simulation (i.e., layout-to-fabrication) models have been proposed to predict the fabrication-induced shape deformation from an IC layout to its fabricated circuit. Such models are usually driven by pairwise learning, involving a training set of layout patterns and their reference shape images after fabrication. However, it is expensive and time-consuming to collect the reference shape images of all layout clips for model training and updating. To address the problem, we propose a deep learning-based layout novelty detection scheme to identify novel (unseen) layout patterns, which cannot be well predicted by a pre-trained pre-simulation model. We devise a global-local novelty scoring mechanism to assess the potential novelty of a layout by exploiting two subnetworks: an autoencoder and a pretrained pre-simulation model. The former characterizes the global structural dissimilarity between a given layout and training samples, whereas the latter extracts a latent code representing the fabrication-induced local deformation. By integrating the global dissimilarity with the local deformation boosted by a self-attention mechanism, our model can accurately detect novelties without the ground-truth circuit shapes of test samples. Based on the detected novelties, we further propose two active-learning strategies to sample a reduced amount of representative layouts most worthy to be fabricated for acquiring their ground-truth circuit shapes. Experimental results demonstrate i) our method's effectiveness in layout novelty detection, and ii) our active-learning strategies' ability in selecting representative novel layouts for keeping a learning-based pre-simulation model updated.
Abstract:Label noise in training data can significantly degrade a model's generalization performance for supervised learning tasks. Here we focus on the problem that noisy labels are primarily mislabeled samples, which tend to be concentrated near decision boundaries, rather than uniformly distributed, and whose features should be equivocal. To address the problem, we propose an ensemble learning method to correct noisy labels by exploiting the local structures of feature manifolds. Different from typical ensemble strategies that increase the prediction diversity among sub-models via certain loss terms, our method trains sub-models on disjoint subsets, each being a union of the nearest-neighbors of randomly selected seed samples on the data manifold. As a result, each sub-model can learn a coarse representation of the data manifold along with a corresponding graph. Moreover, only a limited number of sub-models will be affected by locally-concentrated noisy labels. The constructed graphs are used to suggest a series of label correction candidates, and accordingly, our method derives label correction results by voting down inconsistent suggestions. Our experiments on real-world noisy label datasets demonstrate the superiority of the proposed method over existing state-of-the-arts.
Abstract:Recently, falsified images have been found in papers involved in research misconducts. However, although there have been many image forgery detection methods, none of them was designed for molecular-biological experiment images. In this paper, we proposed a fast blind inquiry method, named FBI$_{GEL}$, for integrity of images obtained from two common sorts of molecular experiments, i.e., western blot (WB) and polymerase chain reaction (PCR). Based on an optimized pseudo-background capable of highlighting local residues, FBI$_{GEL}$ can reveal traceable vestiges suggesting inappropriate local modifications on WB/PCR images. Additionally, because the optimized pseudo-background is derived according to a closed-form solution, FBI$_{GEL}$ is computationally efficient and thus suitable for large scale inquiry tasks for WB/PCR image integrity. We applied FBI$_{GEL}$ on several papers questioned by the public on \textbf{PUBPEER}, and our results show that figures of those papers indeed contain doubtful unnatural patterns.
Abstract:The performance of a convolutional neural network (CNN) based face recognition model largely relies on the richness of labelled training data. Collecting a training set with large variations of a face identity under different poses and illumination changes, however, is very expensive, making the diversity of within-class face images a critical issue in practice. In this paper, we propose a 3D model-assisted domain-transferred face augmentation network (DotFAN) that can generate a series of variants of an input face based on the knowledge distilled from existing rich face datasets collected from other domains. DotFAN is structurally a conditional CycleGAN but has two additional subnetworks, namely face expert network (FEM) and face shape regressor (FSR), for latent code control. While FSR aims to extract face attributes, FEM is designed to capture a face identity. With their aid, DotFAN can learn a disentangled face representation and effectively generate face images of various facial attributes while preserving the identity of augmented faces. Experiments show that DotFAN is beneficial for augmenting small face datasets to improve their within-class diversity so that a better face recognition model can be learned from the augmented dataset.
Abstract:Since IC fabrication is costly and time-consuming, it is highly desirable to develop virtual metrology tools that can predict the properties of a wafer based on fabrication configurations without performing physical measurements on a fabricated IC. We propose a deep learning-based data-driven framework consisting of two convolutional neural networks: i) LithoNet that predicts the shape deformations on a circuit due to IC fabrication, and ii) OPCNet that suggests IC layout corrections to compensate for such shape deformations. By learning the shape correspondence between pairs of layout design patterns and their SEM images of the product wafer thereof, given an IC layout pattern, LithoNet can mimic the fabrication procedure to predict its fabricated circuit shape for virtual metrology. Furthermore, LithoNet can take the wafer fabrication parameters as a latent vector to model the parametric product variations that can be inspected on SEM images. In addition, traditional lithography simulation methods used to suggest a correction on a lithographic photomask is computationally expensive. Our proposed OPCNet mimics the optical proximity correction (OPC) procedure and efficiently generates a corrected photomask by collaborating with LithoNet to examine if the shape of a fabricated IC circuitry best matches its original layout design. As a result, the proposed LithoNet-OPCNet framework cannot only predict the shape of a fabricated IC from its layout pattern, but also suggests a layout correction according to the consistency between the predicted shape and the given layout. Experimental results with several benchmark layout patterns demonstrate the effectiveness of the proposed method.