Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Haozhe Jia

RMDM: Radio Map Diffusion Model with Physics Informed

Jan 31, 2025

Haozhe Jia, Wenshuo Chen, Zhihui Huang, Hongru Xiao, Nanqian Jia, Keming Wu, Songning Lai, Yutao Yue

Abstract:With the rapid development of wireless communication technology, the efficient utilization of spectrum resources, optimization of communication quality, and intelligent communication have become critical. Radio map reconstruction is essential for enabling advanced applications, yet challenges such as complex signal propagation and sparse data hinder accurate reconstruction. To address these issues, we propose the **Radio Map Diffusion Model (RMDM)**, a physics-informed framework that integrates **Physics-Informed Neural Networks (PINNs)** to incorporate constraints like the **Helmholtz equation**. RMDM employs a dual U-Net architecture: the first ensures physical consistency by minimizing PDE residuals, boundary conditions, and source constraints, while the second refines predictions via diffusion-based denoising. By leveraging physical laws, RMDM significantly enhances accuracy, robustness, and generalization. Experiments demonstrate that RMDM outperforms state-of-the-art methods, achieving **NMSE of 0.0031** and **RMSE of 0.0125** under the Static RM (SRM) setting, and **NMSE of 0.0047** and **RMSE of 0.0146** under the Dynamic RM (DRM) setting. These results establish a novel paradigm for integrating physics-informed and data-driven approaches in radio map reconstruction, particularly under sparse data conditions.

Via

Access Paper or Ask Questions

Free-T2M: Frequency Enhanced Text-to-Motion Diffusion Model With Consistency Loss

Jan 30, 2025

Wenshuo Chen, Haozhe Jia, Songning Lai, Keming Wu, Hongru Xiao, Lijie Hu, Yutao Yue

Figure 1 for Free-T2M: Frequency Enhanced Text-to-Motion Diffusion Model With Consistency Loss

Figure 2 for Free-T2M: Frequency Enhanced Text-to-Motion Diffusion Model With Consistency Loss

Figure 3 for Free-T2M: Frequency Enhanced Text-to-Motion Diffusion Model With Consistency Loss

Figure 4 for Free-T2M: Frequency Enhanced Text-to-Motion Diffusion Model With Consistency Loss

Abstract:Rapid progress in text-to-motion generation has been largely driven by diffusion models. However, existing methods focus solely on temporal modeling, thereby overlooking frequency-domain analysis. We identify two key phases in motion denoising: the **semantic planning stage** and the **fine-grained improving stage**. To address these phases effectively, we propose **Fre**quency **e**nhanced **t**ext-**to**-**m**otion diffusion model (**Free-T2M**), incorporating stage-specific consistency losses that enhance the robustness of static features and improve fine-grained accuracy. Extensive experiments demonstrate the effectiveness of our method. Specifically, on StableMoFusion, our method reduces the FID from **0.189** to **0.051**, establishing a new SOTA performance within the diffusion architecture. These findings highlight the importance of incorporating frequency-domain insights into text-to-motion generation for more precise and robust results.

Via

Access Paper or Ask Questions

DCTdiff: Intriguing Properties of Image Generative Modeling in the DCT Space

Dec 19, 2024

Mang Ning, Mingxiao Li, Jianlin Su, Haozhe Jia, Lanmiao Liu, Martin Beneš, Albert Ali Salah, Itir Onal Ertugrul

Abstract:This paper explores image modeling from the frequency space and introduces DCTdiff, an end-to-end diffusion generative paradigm that efficiently models images in the discrete cosine transform (DCT) space. We investigate the design space of DCTdiff and reveal the key design factors. Experiments on different frameworks (UViT, DiT), generation tasks, and various diffusion samplers demonstrate that DCTdiff outperforms pixel-based diffusion models regarding generative quality and training efficiency. Remarkably, DCTdiff can seamlessly scale up to high-resolution generation without using the latent diffusion paradigm. Finally, we illustrate several intriguing properties of DCT image modeling. For example, we provide a theoretical proof of why `image diffusion can be seen as spectral autoregression', bridging the gap between diffusion and autoregressive models. The effectiveness of DCTdiff and the introduced properties suggest a promising direction for image modeling in the frequency space. The code is at \url{https://github.com/forever208/DCTdiff}.

* 23 pages

Via

Access Paper or Ask Questions

Block-wise LoRA: Revisiting Fine-grained LoRA for Effective Personalization and Stylization in Text-to-Image Generation

Mar 12, 2024

Likun Li, Haoqi Zeng, Changpeng Yang, Haozhe Jia, Di Xu

Abstract:The objective of personalization and stylization in text-to-image is to instruct a pre-trained diffusion model to analyze new concepts introduced by users and incorporate them into expected styles. Recently, parameter-efficient fine-tuning (PEFT) approaches have been widely adopted to address this task and have greatly propelled the development of this field. Despite their popularity, existing efficient fine-tuning methods still struggle to achieve effective personalization and stylization in T2I generation. To address this issue, we propose block-wise Low-Rank Adaptation (LoRA) to perform fine-grained fine-tuning for different blocks of SD, which can generate images faithful to input prompts and target identity and also with desired style. Extensive experiments demonstrate the effectiveness of the proposed method.

Via

Access Paper or Ask Questions

DisControlFace: Disentangled Control for Personalized Facial Image Editing

Dec 11, 2023

Haozhe Jia, Yan Li, Hengfei Cui, Di Xu, Changpeng Yang, Yuwang Wang, Tao Yu

Figure 1 for DisControlFace: Disentangled Control for Personalized Facial Image Editing

Figure 2 for DisControlFace: Disentangled Control for Personalized Facial Image Editing

Figure 3 for DisControlFace: Disentangled Control for Personalized Facial Image Editing

Figure 4 for DisControlFace: Disentangled Control for Personalized Facial Image Editing

Abstract:In this work, we focus on exploring explicit fine-grained control of generative facial image editing, all while generating faithful and consistent personalized facial appearances. We identify the key challenge of this task as the exploration of disentangled conditional control in the generation process, and accordingly propose a novel diffusion-based framework, named DisControlFace, comprising two decoupled components. Specifically, we leverage an off-the-shelf diffusion reconstruction model as the backbone and freeze its pre-trained weights, which helps to reduce identity shift and recover editing-unrelated details of the input image. Furthermore, we construct a parallel control network that is compatible with the reconstruction backbone to generate spatial control conditions based on estimated explicit face parameters. Finally, we further reformulate the training pipeline into a masked-autoencoding form to effectively achieve disentangled training of our DisControlFace. Our DisControlNet can perform robust editing on any facial image through training on large-scale 2D in-the-wild portraits and also supports low-cost fine-tuning with few additional images to further learn diverse personalized priors of a specific person. Extensive experiments demonstrate that DisControlFace can generate realistic facial images corresponding to various face control conditions, while significantly improving the preservation of the personalized facial details.

Via

Access Paper or Ask Questions

HNF-Netv2 for Brain Tumor Segmentation using multi-modal MR Imaging

Feb 10, 2022

Haozhe Jia, Chao Bai, Weidong Cai, Heng Huang, Yong Xia

Figure 1 for HNF-Netv2 for Brain Tumor Segmentation using multi-modal MR Imaging

Figure 2 for HNF-Netv2 for Brain Tumor Segmentation using multi-modal MR Imaging

Figure 3 for HNF-Netv2 for Brain Tumor Segmentation using multi-modal MR Imaging

Figure 4 for HNF-Netv2 for Brain Tumor Segmentation using multi-modal MR Imaging

Abstract:In our previous work, $i.e.$, HNF-Net, high-resolution feature representation and light-weight non-local self-attention mechanism are exploited for brain tumor segmentation using multi-modal MR imaging. In this paper, we extend our HNF-Net to HNF-Netv2 by adding inter-scale and intra-scale semantic discrimination enhancing blocks to further exploit global semantic discrimination for the obtained high-resolution features. We trained and evaluated our HNF-Netv2 on the multi-modal Brain Tumor Segmentation Challenge (BraTS) 2021 dataset. The result on the test set shows that our HNF-Netv2 achieved the average Dice scores of 0.878514, 0.872985, and 0.924919, as well as the Hausdorff distances ($95\%$) of 8.9184, 16.2530, and 4.4895 for the enhancing tumor, tumor core, and whole tumor, respectively. Our method won the RSNA 2021 Brain Tumor AI Challenge Prize (Segmentation Task), which ranks 8th out of all 1250 submitted results.

* RSNA 2021 Brain Tumor AI Challenge Top Solution. arXiv admin note: substantial text overlap with arXiv:2012.15318

Via

Access Paper or Ask Questions

PSGR: Pixel-wise Sparse Graph Reasoning for COVID-19 Pneumonia Segmentation in CT Images

Aug 09, 2021

Haozhe Jia, Haoteng Tang, Guixiang Ma, Weidong Cai, Heng Huang, Liang Zhan, Yong Xia

Figure 1 for PSGR: Pixel-wise Sparse Graph Reasoning for COVID-19 Pneumonia Segmentation in CT Images

Figure 2 for PSGR: Pixel-wise Sparse Graph Reasoning for COVID-19 Pneumonia Segmentation in CT Images

Figure 3 for PSGR: Pixel-wise Sparse Graph Reasoning for COVID-19 Pneumonia Segmentation in CT Images

Figure 4 for PSGR: Pixel-wise Sparse Graph Reasoning for COVID-19 Pneumonia Segmentation in CT Images

Abstract:Automated and accurate segmentation of the infected regions in computed tomography (CT) images is critical for the prediction of the pathological stage and treatment response of COVID-19. Several deep convolutional neural networks (DCNNs) have been designed for this task, whose performance, however, tends to be suppressed by their limited local receptive fields and insufficient global reasoning ability. In this paper, we propose a pixel-wise sparse graph reasoning (PSGR) module and insert it into a segmentation network to enhance the modeling of long-range dependencies for COVID-19 infected region segmentation in CT images. In the PSGR module, a graph is first constructed by projecting each pixel on a node based on the features produced by the segmentation backbone, and then converted into a sparsely-connected graph by keeping only K strongest connections to each uncertain pixel. The long-range information reasoning is performed on the sparsely-connected graph to generate enhanced features. The advantages of this module are two-fold: (1) the pixel-wise mapping strategy not only avoids imprecise pixel-to-node projections but also preserves the inherent information of each pixel for global reasoning; and (2) the sparsely-connected graph construction results in effective information retrieval and reduction of the noise propagation. The proposed solution has been evaluated against four widely-used segmentation models on three public datasets. The results show that the segmentation model equipped with our PSGR module can effectively segment COVID-19 infected regions in CT images, outperforming all other competing models.

Via

Access Paper or Ask Questions

Boundary-aware Graph Reasoning for Semantic Segmentation

Aug 09, 2021

Haoteng Tang, Haozhe Jia, Weidong Cai, Heng Huang, Yong Xia, Liang Zhan

Figure 1 for Boundary-aware Graph Reasoning for Semantic Segmentation

Figure 2 for Boundary-aware Graph Reasoning for Semantic Segmentation

Figure 3 for Boundary-aware Graph Reasoning for Semantic Segmentation

Figure 4 for Boundary-aware Graph Reasoning for Semantic Segmentation

Abstract:In this paper, we propose a Boundary-aware Graph Reasoning (BGR) module to learn long-range contextual features for semantic segmentation. Rather than directly construct the graph based on the backbone features, our BGR module explores a reasonable way to combine segmentation erroneous regions with the graph construction scenario. Motivated by the fact that most hard-to-segment pixels broadly distribute on boundary regions, our BGR module uses the boundary score map as prior knowledge to intensify the graph node connections and thereby guide the graph reasoning focus on boundary regions. In addition, we employ an efficient graph convolution implementation to reduce the computational cost, which benefits the integration of our BGR module into current segmentation backbones. Extensive experiments on three challenging segmentation benchmarks demonstrate the effectiveness of our proposed BGR module for semantic segmentation.

Via

Access Paper or Ask Questions

H2NF-Net for Brain Tumor Segmentation using Multimodal MR Imaging: 2nd Place Solution to BraTS Challenge 2020 Segmentation Task

Dec 30, 2020

Haozhe Jia, Weidong Cai, Heng Huang, Yong Xia

Figure 1 for H2NF-Net for Brain Tumor Segmentation using Multimodal MR Imaging: 2nd Place Solution to BraTS Challenge 2020 Segmentation Task

Figure 2 for H2NF-Net for Brain Tumor Segmentation using Multimodal MR Imaging: 2nd Place Solution to BraTS Challenge 2020 Segmentation Task

Figure 3 for H2NF-Net for Brain Tumor Segmentation using Multimodal MR Imaging: 2nd Place Solution to BraTS Challenge 2020 Segmentation Task

Figure 4 for H2NF-Net for Brain Tumor Segmentation using Multimodal MR Imaging: 2nd Place Solution to BraTS Challenge 2020 Segmentation Task

Abstract:In this paper, we propose a Hybrid High-resolution and Non-local Feature Network (H2NF-Net) to segment brain tumor in multimodal MR images. Our H2NF-Net uses the single and cascaded HNF-Nets to segment different brain tumor sub-regions and combines the predictions together as the final segmentation. We trained and evaluated our model on the Multimodal Brain Tumor Segmentation Challenge (BraTS) 2020 dataset. The results on the test set show that the combination of the single and cascaded models achieved average Dice scores of 0.78751, 0.91290, and 0.85461, as well as Hausdorff distances ($95\%$) of 26.57525, 4.18426, and 4.97162 for the enhancing tumor, whole tumor, and tumor core, respectively. Our method won the second place in the BraTS 2020 challenge segmentation task out of nearly 80 participants.

* Accepted by MICCAI BrainLesion Workshop 2020

Via

Access Paper or Ask Questions

3D Global Convolutional Adversarial Network\\ for Prostate MR Volume Segmentation

Jul 18, 2018

Haozhe Jia, Yang Song, Donghao Zhang, Heng Huang, Dagan Feng, Michael Fulham, Yong Xia, Weidong Cai

$Figure 1 for 3D Global Convolutional Adversarial Network\\ for Prostate MR Volume Segmentation$

$Figure 2 for 3D Global Convolutional Adversarial Network\\ for Prostate MR Volume Segmentation$

$Figure 3 for 3D Global Convolutional Adversarial Network\\ for Prostate MR Volume Segmentation$

$Figure 4 for 3D Global Convolutional Adversarial Network\\ for Prostate MR Volume Segmentation$

Abstract:Advanced deep learning methods have been developed to conduct prostate MR volume segmentation in either a 2D or 3D fully convolutional manner. However, 2D methods tend to have limited segmentation performance, since large amounts of spatial information of prostate volumes are discarded during the slice-by-slice segmentation process; and 3D methods also have room for improvement, since they use isotropic kernels to perform 3D convolutions whereas most prostate MR volumes have anisotropic spatial resolution. Besides, the fully convolutional structural methods achieve good performance for localization issues but neglect the per-voxel classification for segmentation tasks. In this paper, we propose a 3D Global Convolutional Adversarial Network (3D GCA-Net) to address efficient prostate MR volume segmentation. We first design a 3D ResNet encoder to extract 3D features from prostate scans, and then develop the decoder, which is composed of a multi-scale 3D global convolutional block and a 3D boundary refinement block, to address the classification and localization issues simultaneously for volumetric segmentation. Additionally, we combine the encoder-decoder segmentation network with an adversarial network in the training phrase to enforce the contiguity of long-range spatial predictions. Throughout the proposed model, we use anisotropic convolutional processing for better feature learning on prostate MR scans. We evaluated our 3D GCA-Net model on two public prostate MR datasets and achieved state-of-the-art performances.

* 9 pages, 3 figures, 2 tables, 16 references

Via

Access Paper or Ask Questions