Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zhe Jin

Explicit and Implicit Representations in AI-based 3D Reconstruction for Radiology: A systematic literature review

Apr 15, 2025

Yuezhe Yang, Boyu Yang, Yaqian Wang, Yang He, Xingbo Dong, Zhe Jin

Abstract:The demand for high-quality medical imaging in clinical practice and assisted diagnosis has made 3D reconstruction in radiological imaging a key research focus. Artificial intelligence (AI) has emerged as a promising approach to enhancing reconstruction accuracy while reducing acquisition and processing time, thereby minimizing patient radiation exposure and discomfort and ultimately benefiting clinical diagnosis. This review explores state-of-the-art AI-based 3D reconstruction algorithms in radiological imaging, categorizing them into explicit and implicit approaches based on their underlying principles. Explicit methods include point-based, volume-based, and Gaussian representations, while implicit methods encompass implicit prior embedding and neural radiance fields. Additionally, we examine commonly used evaluation metrics and benchmark datasets. Finally, we discuss the current state of development, key challenges, and future research directions in this evolving field. Our project available on: https://github.com/Bean-Young/AI4Med.

* 43 pages, 5 figures, submit to Medical Image Analysis

Via

Access Paper or Ask Questions

Test-Time Domain Generalization via Universe Learning: A Multi-Graph Matching Approach for Medical Image Segmentation

Mar 17, 2025

Xingguo Lv, Xingbo Dong, Liwen Wang, Jiewen Yang, Lei Zhao, Bin Pu, Zhe Jin, Xuejun Li

Abstract:Despite domain generalization (DG) has significantly addressed the performance degradation of pre-trained models caused by domain shifts, it often falls short in real-world deployment. Test-time adaptation (TTA), which adjusts a learned model using unlabeled test data, presents a promising solution. However, most existing TTA methods struggle to deliver strong performance in medical image segmentation, primarily because they overlook the crucial prior knowledge inherent to medical images. To address this challenge, we incorporate morphological information and propose a framework based on multi-graph matching. Specifically, we introduce learnable universe embeddings that integrate morphological priors during multi-source training, along with novel unsupervised test-time paradigms for domain adaptation. This approach guarantees cycle-consistency in multi-matching while enabling the model to more effectively capture the invariant priors of unseen data, significantly mitigating the effects of domain shifts. Extensive experiments demonstrate that our method outperforms other state-of-the-art approaches on two medical image segmentation benchmarks for both multi-source and single-source domain generalization tasks. The source code is available at https://github.com/Yore0/TTDG-MGM.

Via

Access Paper or Ask Questions

Compose Your Aesthetics: Empowering Text-to-Image Models with the Principles of Art

Mar 15, 2025

Zhe Jin, Tat-Seng Chua

Abstract:Text-to-Image (T2I) diffusion models (DM) have garnered widespread adoption due to their capability in generating high-fidelity outputs and accessibility to anyone able to put imagination into words. However, DMs are often predisposed to generate unappealing outputs, much like the random images on the internet they were trained on. Existing approaches to address this are founded on the implicit premise that visual aesthetics is universal, which is limiting. Aesthetics in the T2I context should be about personalization and we propose the novel task of aesthetics alignment which seeks to align user-specified aesthetics with the T2I generation output. Inspired by how artworks provide an invaluable perspective to approach aesthetics, we codify visual aesthetics using the compositional framework artists employ, known as the Principles of Art (PoA). To facilitate this study, we introduce CompArt, a large-scale compositional art dataset building on top of WikiArt with PoA analysis annotated by a capable Multimodal LLM. Leveraging the expressive power of LLMs and training a lightweight and transferrable adapter, we demonstrate that T2I DMs can effectively offer 10 compositional controls through user-specified PoA conditions. Additionally, we design an appropriate evaluation framework to assess the efficacy of our approach.

Via

Access Paper or Ask Questions

A Multi-task Adversarial Attack Against Face Authentication

Aug 15, 2024

Hanrui Wang, Shuo Wang, Cunjian Chen, Massimo Tistarelli, Zhe Jin

Abstract:Deep-learning-based identity management systems, such as face authentication systems, are vulnerable to adversarial attacks. However, existing attacks are typically designed for single-task purposes, which means they are tailored to exploit vulnerabilities unique to the individual target rather than being adaptable for multiple users or systems. This limitation makes them unsuitable for certain attack scenarios, such as morphing, universal, transferable, and counter attacks. In this paper, we propose a multi-task adversarial attack algorithm called MTADV that are adaptable for multiple users or systems. By interpreting these scenarios as multi-task attacks, MTADV is applicable to both single- and multi-task attacks, and feasible in the white- and gray-box settings. Furthermore, MTADV is effective against various face datasets, including LFW, CelebA, and CelebA-HQ, and can work with different deep learning models, such as FaceNet, InsightFace, and CurricularFace. Importantly, MTADV retains its feasibility as a single-task attack targeting a single user/system. To the best of our knowledge, MTADV is the first adversarial attack method that can target all of the aforementioned scenarios in one algorithm.

* Accepted by ACM Transactions on Multimedia Computing, Communications, and Applications

Via

Access Paper or Ask Questions

IFViT: Interpretable Fixed-Length Representation for Fingerprint Matching via Vision Transformer

Apr 12, 2024

Yuhang Qiu, Honghui Chen, Xingbo Dong, Zheng Lin, Iman Yi Liao, Massimo Tistarelli, Zhe Jin

Abstract:Determining dense feature points on fingerprints used in constructing deep fixed-length representations for accurate matching, particularly at the pixel level, is of significant interest. To explore the interpretability of fingerprint matching, we propose a multi-stage interpretable fingerprint matching network, namely Interpretable Fixed-length Representation for Fingerprint Matching via Vision Transformer (IFViT), which consists of two primary modules. The first module, an interpretable dense registration module, establishes a Vision Transformer (ViT)-based Siamese Network to capture long-range dependencies and the global context in fingerprint pairs. It provides interpretable dense pixel-wise correspondences of feature points for fingerprint alignment and enhances the interpretability in the subsequent matching stage. The second module takes into account both local and global representations of the aligned fingerprint pair to achieve an interpretable fixed-length representation extraction and matching. It employs the ViTs trained in the first module with the additional fully connected layer and retrains them to simultaneously produce the discriminative fixed-length representation and interpretable dense pixel-wise correspondences of feature points. Extensive experimental results on diverse publicly available fingerprint databases demonstrate that the proposed framework not only exhibits superior performance on dense registration and matching but also significantly promotes the interpretability in deep fixed-length representations-based fingerprint matching.

* ready to submit to IEEE Transactions on Information Forensics and Security (TIFS)

Via

Access Paper or Ask Questions

On Computational Entanglement and Its Interpretation in Adversarial Machine Learning

Oct 04, 2023

YenLung Lai, Xingbo Dong, Zhe Jin

Abstract:Adversarial examples in machine learning has emerged as a focal point of research due to their remarkable ability to deceive models with seemingly inconspicuous input perturbations, potentially resulting in severe consequences. In this study, we embark on a comprehensive exploration of adversarial machine learning models, shedding light on their intrinsic complexity and interpretability. Our investigation reveals intriguing links between machine learning model complexity and Einstein's theory of special relativity, through the concept of entanglement. More specific, we define entanglement computationally and demonstrate that distant feature samples can exhibit strong correlations, akin to entanglement in quantum realm. This revelation challenges conventional perspectives in describing the phenomenon of adversarial transferability observed in contemporary machine learning models. By drawing parallels with the relativistic effects of time dilation and length contraction during computation, we gain deeper insights into adversarial machine learning, paving the way for more robust and interpretable models in this rapidly evolving field.

Via

Access Paper or Ask Questions

UMS-VINS: United Monocular-Stereo Features for Visual-Inertial Tightly Coupled Odometry

Mar 15, 2023

Chaoyang Jiang, Xiaoni Zheng, Zhe Jin, Chengpu Yu

Abstract:This paper introduces the united monocular-stereo features into a visual-inertial tightly coupled odometry (UMS-VINS) for robust pose estimation. UMS-VINS requires two cameras and a low-cost inertial measurement unit (IMU). The UMS-VINS is an evolution of VINS-FUSION, which modifies the VINS-FUSION from the following three perspectives. 1) UMS-VINS extracts and tracks features from the sub-pixel plane to achieve better positions of the features. 2) UMS-VINS introduces additional 2-dimensional features from the left and/or right cameras. 3) If the visual initialization fails, the IMU propagation is directly used for pose estimation, and if the visual-IMU alignment fails, UMS-VINS estimates the pose via the visual odometry. The performances on both public datasets and new real-world experiments indicate that the proposed UMS-VINS outperforms the VINS-FUSION from the perspective of localization accuracy, localization robustness, and environmental adaptability.

Via

Access Paper or Ask Questions

Range-Aided LiDAR-Inertial Multi-Vehicle Mapping in Degenerate Environment

Mar 15, 2023

Zhe Jin, Chaoyang Jiang

Figure 1 for Range-Aided LiDAR-Inertial Multi-Vehicle Mapping in Degenerate Environment

Figure 2 for Range-Aided LiDAR-Inertial Multi-Vehicle Mapping in Degenerate Environment

Figure 3 for Range-Aided LiDAR-Inertial Multi-Vehicle Mapping in Degenerate Environment

Figure 4 for Range-Aided LiDAR-Inertial Multi-Vehicle Mapping in Degenerate Environment

Abstract:This paper presents a range-aided LiDAR-inertial multi-vehicle mapping system (RaLI-Multi). Firstly, we design a multi-metric weights LiDAR-inertial odometry by fusing observations from an inertial measurement unit (IMU) and a light detection and ranging sensor (LiDAR). The degenerate level and direction are evaluated by analyzing the distribution of normal vectors of feature point clouds and are used to activate the degeneration correction module in which range measurements correct the pose estimation from the degeneration direction. We then design a multi-vehicle mapping system in which a centralized vehicle receives local maps of each vehicle and range measurements between vehicles to optimize a global pose graph. The global map is broadcast to other vehicles for localization and mapping updates, and the centralized vehicle is dynamically fungible. Finally, we provide three experiments to verify the effectiveness of the proposed RaLI-Multi. The results show its superiority in degeneration environments

Via

Access Paper or Ask Questions

Reconstruct Face from Features Using GAN Generator as a Distribution Constraint

Jun 09, 2022

Xingbo Dong, Zhihui Miao, Lan Ma, Jiajun Shen, Zhe Jin, Zhenhua Guo, Andrew Beng Jin Teoh

Figure 1 for Reconstruct Face from Features Using GAN Generator as a Distribution Constraint

Figure 2 for Reconstruct Face from Features Using GAN Generator as a Distribution Constraint

Figure 3 for Reconstruct Face from Features Using GAN Generator as a Distribution Constraint

Figure 4 for Reconstruct Face from Features Using GAN Generator as a Distribution Constraint

Abstract:Face recognition based on the deep convolutional neural networks (CNN) shows superior accuracy performance attributed to the high discriminative features extracted. Yet, the security and privacy of the extracted features from deep learning models (deep features) have been often overlooked. This paper proposes the reconstruction of face images from deep features without accessing the CNN network configurations as a constrained optimization problem. Such optimization minimizes the distance between the features extracted from the original face image and the reconstructed face image. Instead of directly solving the optimization problem in the image space, we innovatively reformulate the problem by looking for a latent vector of a GAN generator, then use it to generate the face image. The GAN generator serves as a dual role in this novel framework, i.e., face distribution constraint of the optimization goal and a face generator. On top of the novel optimization task, we also propose an attack pipeline to impersonate the target user based on the generated face image. Our results show that the generated face images can achieve a state-of-the-art successful attack rate of 98.0\% on LFW under type-I attack @ FAR of 0.1\%. Our work sheds light on the biometric deployment to meet the privacy-preserving and security policies.

Via

Access Paper or Ask Questions

Abandoning the Bayer-Filter to See in the Dark

Mar 22, 2022

Xingbo Dong, Wanyan Xu, Zhihui Miao, Lan Ma, Chao Zhang, Jiewen Yang, Zhe Jin, Andrew Beng Jin Teoh, Jiajun Shen

Figure 1 for Abandoning the Bayer-Filter to See in the Dark

Figure 2 for Abandoning the Bayer-Filter to See in the Dark

Figure 3 for Abandoning the Bayer-Filter to See in the Dark

Figure 4 for Abandoning the Bayer-Filter to See in the Dark

Abstract:Low-light image enhancement - a pervasive but challenging problem, plays a central role in enhancing the visibility of an image captured in a poor illumination environment. Due to the fact that not all photons can pass the Bayer-Filter on the sensor of the color camera, in this work, we first present a De-Bayer-Filter simulator based on deep neural networks to generate a monochrome raw image from the colored raw image. Next, a fully convolutional network is proposed to achieve the low-light image enhancement by fusing colored raw data with synthesized monochrome raw data. Channel-wise attention is also introduced to the fusion process to establish a complementary interaction between features from colored and monochrome raw images. To train the convolutional networks, we propose a dataset with monochrome and color raw pairs named Mono-Colored Raw paired dataset (MCR) collected by using a monochrome camera without Bayer-Filter and a color camera with Bayer-Filter. The proposed pipeline take advantages of the fusion of the virtual monochrome and the color raw images and our extensive experiments indicate that significant improvement can be achieved by leveraging raw sensor data and data-driven learning.

Via

Access Paper or Ask Questions