Abstract:This paper proposes MambaST, a plug-and-play cross-spectral spatial-temporal fusion pipeline for efficient pedestrian detection. Several challenges exist for pedestrian detection in autonomous driving applications. First, it is difficult to perform accurate detection using RGB cameras under dark or low-light conditions. Cross-spectral systems must be developed to integrate complementary information from multiple sensor modalities, such as thermal and visible cameras, to improve the robustness of the detections. Second, pedestrian detection models are latency-sensitive. Efficient and easy-to-scale detection models with fewer parameters are highly desirable for real-time applications such as autonomous driving. Third, pedestrian video data provides spatial-temporal correlations of pedestrian movement. It is beneficial to incorporate temporal as well as spatial information to enhance pedestrian detection. This work leverages recent advances in the state space model (Mamba) and proposes a novel Multi-head Hierarchical Patching and Aggregation (MHHPA) structure to extract both fine-grained and coarse-grained information from both RGB and thermal imagery. Experimental results show that the proposed MHHPA is an effective and efficient alternative to a Transformer model for cross-spectral pedestrian detection. Our proposed model also achieves superior performance on small-scale pedestrian detection. The code is available at https://github.com/XiangboGaoBarry/MambaST}{https://github.com/XiangboGaoBarry/MambaST.
Abstract:The morphology of geological particles is crucial in determining its granular characteristics and assembly responses. In this paper, Metaball-function based solutions are proposed for morphological characterization and generation of three-dimensional realistic particles according to the X-ray Computed Tomography (XRCT) images. For characterization, we develop a geometric-based Metaball-Imaging algorithm. This algorithm can capture the main contour of parental particles with a series of non-overlapping spheres and refine surface-texture details through gradient search. Four types of particles, hundreds of samples, are applied for evaluations. The result shows good matches on key morphological indicators(i.e., volume, surface area, sphericity, circularity, corey-shape factor, nominal diameter and surface-equivalent-sphere diameter), confirming its characterization precision. For generation, we propose the Metaball Variational Autoencoder. Assisted by deep neural networks, this method can generate 3D particles in Metaball form, while retaining coessential morphological features with parental particles. Additionally, this method allows for control over the generated shapes through an arithmetic pattern, enabling the generation of particles with specific shapes. Two sets of XRCT images different in sample number and geometric features are chosen as parental data. On each training set, one thousand particles are generated for validations. The generation fidelity is demonstrated through comparisons of morphologies and shape-feature distributions between generated and parental particles. Examples are also provided to demonstrate controllability on the generated shapes. With Metaball-based simulations frameworks previously proposed by the authors, these methods have the potential to provide valuable insights into the properties and behavior of actual geological particles.
Abstract:Traditional pixel-wise image attack algorithms suffer from poor robustness to defense algorithms, i.e., the attack strength degrades dramatically when defense algorithms are applied. Although Generative Adversarial Networks (GAN) can partially address this problem by synthesizing a more semantically meaningful texture pattern, the main limitation is that existing generators can only generate images of a specific scale. In this paper, we propose a scale-free generation-based attack algorithm that synthesizes semantically meaningful adversarial patterns globally to images with arbitrary scales. Our generative attack approach consistently outperforms the state-of-the-art methods on a wide range of attack settings, i.e. the proposed approach largely degraded the performance of various image classification, object detection, and instance segmentation algorithms under different advanced defense methods.
Abstract:Due to the difficulty of cancer samples collection and annotation, cervical cancer datasets usually exhibit a long-tailed data distribution. When training a detector to detect the cancer cells in a WSI (Whole Slice Image) image captured from the TCT (Thinprep Cytology Test) specimen, head categories (e.g. normal cells and inflammatory cells) typically have a much larger number of samples than tail categories (e.g. cancer cells). Most existing state-of-the-art long-tailed learning methods in object detection focus on category distribution statistics to solve the problem in the long-tailed scenario without considering the "hardness" of each sample. To address this problem, in this work we propose a Grad-Libra Loss that leverages the gradients to dynamically calibrate the degree of hardness of each sample for different categories, and re-balance the gradients of positive and negative samples. Our loss can thus help the detector to put more emphasis on those hard samples in both head and tail categories. Extensive experiments on a long-tailed TCT WSI image dataset show that the mainstream detectors, e.g. RepPoints, FCOS, ATSS, YOLOF, etc. trained using our proposed Gradient-Libra Loss, achieved much higher (7.8%) mAP than that trained using cross-entropy classification loss.