Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xin Luo

Dyna3DGR: 4D Cardiac Motion Tracking with Dynamic 3D Gaussian Representation

Jul 22, 2025

Xueming Fu, Pei Wu, Yingtai Li, Xin Luo, Zihang Jiang, Junhao Mei, Jian Lu, Gao-Jun Teng, S. Kevin Zhou

Abstract:Accurate analysis of cardiac motion is crucial for evaluating cardiac function. While dynamic cardiac magnetic resonance imaging (CMR) can capture detailed tissue motion throughout the cardiac cycle, the fine-grained 4D cardiac motion tracking remains challenging due to the homogeneous nature of myocardial tissue and the lack of distinctive features. Existing approaches can be broadly categorized into image based and representation-based, each with its limitations. Image-based methods, including both raditional and deep learning-based registration approaches, either struggle with topological consistency or rely heavily on extensive training data. Representation-based methods, while promising, often suffer from loss of image-level details. To address these limitations, we propose Dynamic 3D Gaussian Representation (Dyna3DGR), a novel framework that combines explicit 3D Gaussian representation with implicit neural motion field modeling. Our method simultaneously optimizes cardiac structure and motion in a self-supervised manner, eliminating the need for extensive training data or point-to-point correspondences. Through differentiable volumetric rendering, Dyna3DGR efficiently bridges continuous motion representation with image-space alignment while preserving both topological and temporal consistency. Comprehensive evaluations on the ACDC dataset demonstrate that our approach surpasses state-of-the-art deep learning-based diffeomorphic registration methods in tracking accuracy. The code will be available in https://github.com/windrise/Dyna3DGR.

* Accepted to MICCAI 2025

Via

Access Paper or Ask Questions

OmniGen2: Exploration to Advanced Multimodal Generation

Jun 23, 2025

Chenyuan Wu, Pengfei Zheng, Ruiran Yan, Shitao Xiao, Xin Luo, Yueze Wang, Wanli Li, Xiyan Jiang, Yexin Liu, Junjie Zhou(+12 more)

Abstract:In this work, we introduce OmniGen2, a versatile and open-source generative model designed to provide a unified solution for diverse generation tasks, including text-to-image, image editing, and in-context generation. Unlike OmniGen v1, OmniGen2 features two distinct decoding pathways for text and image modalities, utilizing unshared parameters and a decoupled image tokenizer. This design enables OmniGen2 to build upon existing multimodal understanding models without the need to re-adapt VAE inputs, thereby preserving the original text generation capabilities. To facilitate the training of OmniGen2, we developed comprehensive data construction pipelines, encompassing image editing and in-context generation data. Additionally, we introduce a reflection mechanism tailored for image generation tasks and curate a dedicated reflection dataset based on OmniGen2. Despite its relatively modest parameter size, OmniGen2 achieves competitive results on multiple task benchmarks, including text-to-image and image editing. To further evaluate in-context generation, also referred to as subject-driven tasks, we introduce a new benchmark named OmniContext. OmniGen2 achieves state-of-the-art performance among open-source models in terms of consistency. We will release our models, training code, datasets, and data construction pipeline to support future research in this field. Project Page: https://vectorspacelab.github.io/OmniGen2; GitHub Link: https://github.com/VectorSpaceLab/OmniGen2

Via

Access Paper or Ask Questions

Cross-Modal Clustering-Guided Negative Sampling for Self-Supervised Joint Learning from Medical Images and Reports

Jun 13, 2025

Libin Lan, Hongxing Li, Zunhui Xia, Juan Zhou, Xiaofei Zhu, Yongmei Li, Yudong Zhang, Xin Luo

Abstract:Learning medical visual representations directly from paired images and reports through multimodal self-supervised learning has emerged as a novel and efficient approach to digital diagnosis in recent years. However, existing models suffer from several severe limitations. 1) neglecting the selection of negative samples, resulting in the scarcity of hard negatives and the inclusion of false negatives; 2) focusing on global feature extraction, but overlooking the fine-grained local details that are crucial for medical image recognition tasks; and 3) contrastive learning primarily targets high-level features but ignoring low-level details which are essential for accurate medical analysis. Motivated by these critical issues, this paper presents a Cross-Modal Cluster-Guided Negative Sampling (CM-CGNS) method with two-fold ideas. First, it extends the k-means clustering used for local text features in the single-modal domain to the multimodal domain through cross-modal attention. This improvement increases the number of negative samples and boosts the model representation capability. Second, it introduces a Cross-Modal Masked Image Reconstruction (CM-MIR) module that leverages local text-to-image features obtained via cross-modal attention to reconstruct masked local image regions. This module significantly strengthens the model's cross-modal information interaction capabilities and retains low-level image features essential for downstream tasks. By well handling the aforementioned limitations, the proposed CM-CGNS can learn effective and robust medical visual representations suitable for various recognition tasks. Extensive experimental results on classification, detection, and segmentation tasks across five downstream datasets show that our method outperforms state-of-the-art approaches on multiple metrics, verifying its superior performance.

* This work has been submitted to the IEEE TMI for possible publication. Our code is available at https://github.com/violet-42/CM-CGNS

Via

Access Paper or Ask Questions

DATA: Multi-Disentanglement based Contrastive Learning for Open-World Semi-Supervised Deepfake Attribution

May 07, 2025

Ming-Hui Liu, Xiao-Qian Liu, Xin Luo, Xin-Shun Xu

Abstract:Deepfake attribution (DFA) aims to perform multiclassification on different facial manipulation techniques, thereby mitigating the detrimental effects of forgery content on the social order and personal reputations. However, previous methods focus only on method-specific clues, which easily lead to overfitting, while overlooking the crucial role of common forgery features. Additionally, they struggle to distinguish between uncertain novel classes in more practical open-world scenarios. To address these issues, in this paper we propose an innovative multi-DisentAnglement based conTrastive leArning framework, DATA, to enhance the generalization ability on novel classes for the open-world semi-supervised deepfake attribution (OSS-DFA) task. Specifically, since all generation techniques can be abstracted into a similar architecture, DATA defines the concept of 'Orthonormal Deepfake Basis' for the first time and utilizes it to disentangle method-specific features, thereby reducing the overfitting on forgery-irrelevant information. Furthermore, an augmented-memory mechanism is designed to assist in novel class discovery and contrastive learning, which aims to obtain clear class boundaries for the novel classes through instance-level disentanglements. Additionally, to enhance the standardization and discrimination of features, DATA uses bases contrastive loss and center contrastive loss as auxiliaries for the aforementioned modules. Extensive experimental evaluations show that DATA achieves state-of-the-art performance on the OSS-DFA benchmark, e.g., there are notable accuracy improvements in 2.55% / 5.7% under different settings, compared with the existing methods.

* Accepted by IEEE TMM on 17-Jan-2025; Submitted to IEEE TMM on 11-Jul-2024

Via

Access Paper or Ask Questions

Learning Real Facial Concepts for Independent Deepfake Detection

May 07, 2025

Ming-Hui Liu, Harry Cheng, Tianyi Wang, Xin Luo, Xin-Shun Xu

Abstract:Deepfake detection models often struggle with generalization to unseen datasets, manifesting as misclassifying real instances as fake in target domains. This is primarily due to an overreliance on forgery artifacts and a limited understanding of real faces. To address this challenge, we propose a novel approach RealID to enhance generalization by learning a comprehensive concept of real faces while assessing the probabilities of belonging to the real and fake classes independently. RealID comprises two key modules: the Real Concept Capture Module (RealC2) and the Independent Dual-Decision Classifier (IDC). With the assistance of a MultiReal Memory, RealC2 maintains various prototypes for real faces, allowing the model to capture a comprehensive concept of real class. Meanwhile, IDC redefines the classification strategy by making independent decisions based on the concept of the real class and the presence of forgery artifacts. Through the combined effect of the above modules, the influence of forgery-irrelevant patterns is alleviated, and extensive experiments on five widely used datasets demonstrate that RealID significantly outperforms existing state-of-the-art methods, achieving a 1.74% improvement in average accuracy.

* Accepted by IJCAI 2025

Via

Access Paper or Ask Questions

Concept Factorization via Self-Representation and Adaptive Graph Structure Learning

May 06, 2025

Zhengqin Yang, Di Wu, Jia Chen, Xin Luo

Abstract:Concept Factorization (CF) models have attracted widespread attention due to their excellent performance in data clustering. In recent years, many variant models based on CF have achieved great success in clustering by taking into account the internal geometric manifold structure of the dataset and using graph regularization techniques. However, their clustering performance depends greatly on the construction of the initial graph structure. In order to enable adaptive learning of the graph structure of the data, we propose a Concept Factorization Based on Self-Representation and Adaptive Graph Structure Learning (CFSRAG) Model. CFSRAG learns the affinity relationship between data through a self-representation method, and uses the learned affinity matrix to implement dynamic graph regularization constraints, thereby ensuring dynamic learning of the internal geometric structure of the data. Finally, we give the CFSRAG update rule and convergence analysis, and conduct comparative experiments on four real datasets. The results show that our model outperforms other state-of-the-art models.

Via

Access Paper or Ask Questions

Stronger, Steadier & Superior: Geometric Consistency in Depth VFM Forges Domain Generalized Semantic Segmentation

Apr 17, 2025

Siyu Chen, Ting Han, Changshe Zhang, Xin Luo, Meiliu Wu, Guorong Cai, Jinhe Su

Figure 1 for Stronger, Steadier & Superior: Geometric Consistency in Depth VFM Forges Domain Generalized Semantic Segmentation

Figure 2 for Stronger, Steadier & Superior: Geometric Consistency in Depth VFM Forges Domain Generalized Semantic Segmentation

Figure 3 for Stronger, Steadier & Superior: Geometric Consistency in Depth VFM Forges Domain Generalized Semantic Segmentation

Figure 4 for Stronger, Steadier & Superior: Geometric Consistency in Depth VFM Forges Domain Generalized Semantic Segmentation

Abstract:Vision Foundation Models (VFMs) have delivered remarkable performance in Domain Generalized Semantic Segmentation (DGSS). However, recent methods often overlook the fact that visual cues are susceptible, whereas the underlying geometry remains stable, rendering depth information more robust. In this paper, we investigate the potential of integrating depth information with features from VFMs, to improve the geometric consistency within an image and boost the generalization performance of VFMs. We propose a novel fine-tuning DGSS framework, named DepthForge, which integrates the visual cues from frozen DINOv2 or EVA02 and depth cues from frozen Depth Anything V2. In each layer of the VFMs, we incorporate depth-aware learnable tokens to continuously decouple domain-invariant visual and spatial information, thereby enhancing depth awareness and attention of the VFMs. Finally, we develop a depth refinement decoder and integrate it into the model architecture to adaptively refine multi-layer VFM features and depth-aware learnable tokens. Extensive experiments are conducted based on various DGSS settings and five different datsets as unseen target domains. The qualitative and quantitative results demonstrate that our method significantly outperforms alternative approaches with stronger performance, steadier visual-spatial attention, and superior generalization ability. In particular, DepthForge exhibits outstanding performance under extreme conditions (e.g., night and snow). Code is available at https://github.com/anonymouse-xzrptkvyqc/DepthForge.

Via

Access Paper or Ask Questions

Scale-Invariant Adversarial Attack against Arbitrary-scale Super-resolution

Mar 06, 2025

Yihao Huang, Xin Luo, Qing Guo, Felix Juefei-Xu, Xiaojun Jia, Weikai Miao, Geguang Pu, Yang Liu

Figure 1 for Scale-Invariant Adversarial Attack against Arbitrary-scale Super-resolution

Figure 2 for Scale-Invariant Adversarial Attack against Arbitrary-scale Super-resolution

Figure 3 for Scale-Invariant Adversarial Attack against Arbitrary-scale Super-resolution

Figure 4 for Scale-Invariant Adversarial Attack against Arbitrary-scale Super-resolution

Abstract:The advent of local continuous image function (LIIF) has garnered significant attention for arbitrary-scale super-resolution (SR) techniques. However, while the vulnerabilities of fixed-scale SR have been assessed, the robustness of continuous representation-based arbitrary-scale SR against adversarial attacks remains an area warranting further exploration. The elaborately designed adversarial attacks for fixed-scale SR are scale-dependent, which will cause time-consuming and memory-consuming problems when applied to arbitrary-scale SR. To address this concern, we propose a simple yet effective ``scale-invariant'' SR adversarial attack method with good transferability, termed SIAGT. Specifically, we propose to construct resource-saving attacks by exploiting finite discrete points of continuous representation. In addition, we formulate a coordinate-dependent loss to enhance the cross-model transferability of the attack. The attack can significantly deteriorate the SR images while introducing imperceptible distortion to the targeted low-resolution (LR) images. Experiments carried out on three popular LIIF-based SR approaches and four classical SR datasets show remarkable attack performance and transferability of SIAGT.

* 15 pages, accepted by TIFS 2025

Via

Access Paper or Ask Questions

Breaking Fine-Grained Classification Barriers with Cost-Free Data in Few-Shot Class-Incremental Learning

Dec 29, 2024

Li-Jun Zhao, Zhen-Duo Chen, Zhi-Yuan Xue, Xin Luo, Xin-Shun Xu

Figure 1 for Breaking Fine-Grained Classification Barriers with Cost-Free Data in Few-Shot Class-Incremental Learning

Figure 2 for Breaking Fine-Grained Classification Barriers with Cost-Free Data in Few-Shot Class-Incremental Learning

Figure 3 for Breaking Fine-Grained Classification Barriers with Cost-Free Data in Few-Shot Class-Incremental Learning

Figure 4 for Breaking Fine-Grained Classification Barriers with Cost-Free Data in Few-Shot Class-Incremental Learning

Abstract:Current fine-grained classification research mainly concentrates on fine-grained feature learning, but in real-world applications, the bigger issue often lies in the data. Fine-grained data annotation is challenging, and the features and semantics are highly diverse and frequently changing, making traditional methods less effective in real-world scenarios. Although some studies have provided potential solutions to this issue, most are limited to making use of limited supervised information. In this paper, we propose a novel learning paradigm to break barriers in fine-grained classification. It enables the model to learn beyond the standard training phase and benefit from cost-free data encountered during system operation. On this basis, an efficient EXPloring and EXPloiting strategy and method (EXP2) is designed. Thereinto, before the final classification results are obtained, representative inference data samples are explored according to class templates and exploited to optimize classifiers. Experimental results demonstrate the general effectiveness of EXP2.

* 29 pages

Via

Access Paper or Ask Questions

NeRF Inpainting with Geometric Diffusion Prior and Balanced Score Distillation

Nov 23, 2024

Menglin Zhang, Xin Luo, Yunwei Lan, Chang Liu, Rui Li, Kaidong Zhang, Ganlin Yang, Dong Liu

Figure 1 for NeRF Inpainting with Geometric Diffusion Prior and Balanced Score Distillation

Figure 2 for NeRF Inpainting with Geometric Diffusion Prior and Balanced Score Distillation

Figure 3 for NeRF Inpainting with Geometric Diffusion Prior and Balanced Score Distillation

Figure 4 for NeRF Inpainting with Geometric Diffusion Prior and Balanced Score Distillation

Abstract:Recent advances in NeRF inpainting have leveraged pretrained diffusion models to enhance performance. However, these methods often yield suboptimal results due to their ineffective utilization of 2D diffusion priors. The limitations manifest in two critical aspects: the inadequate capture of geometric information by pretrained diffusion models and the suboptimal guidance provided by existing Score Distillation Sampling (SDS) methods. To address these problems, we introduce GB-NeRF, a novel framework that enhances NeRF inpainting through improved utilization of 2D diffusion priors. Our approach incorporates two key innovations: a fine-tuning strategy that simultaneously learns appearance and geometric priors and a specialized normal distillation loss that integrates these geometric priors into NeRF inpainting. We propose a technique called Balanced Score Distillation (BSD) that surpasses existing methods such as Score Distillation (SDS) and the improved version, Conditional Score Distillation (CSD). BSD offers improved inpainting quality in appearance and geometric aspects. Extensive experiments show that our method provides superior appearance fidelity and geometric consistency compared to existing approaches.

Via

Access Paper or Ask Questions