Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Peng Xiao

Restoring Real-World Images with an Internal Detail Enhancement Diffusion Model

May 24, 2025

Peng Xiao, Hongbo Zhao, Yijun Wang, Jianxin Lin

Abstract:Restoring real-world degraded images, such as old photographs or low-resolution images, presents a significant challenge due to the complex, mixed degradations they exhibit, such as scratches, color fading, and noise. Recent data-driven approaches have struggled with two main challenges: achieving high-fidelity restoration and providing object-level control over colorization. While diffusion models have shown promise in generating high-quality images with specific controls, they often fail to fully preserve image details during restoration. In this work, we propose an internal detail-preserving diffusion model for high-fidelity restoration of real-world degraded images. Our method utilizes a pre-trained Stable Diffusion model as a generative prior, eliminating the need to train a model from scratch. Central to our approach is the Internal Image Detail Enhancement (IIDE) technique, which directs the diffusion model to preserve essential structural and textural information while mitigating degradation effects. The process starts by mapping the input image into a latent space, where we inject the diffusion denoising process with degradation operations that simulate the effects of various degradation factors. Extensive experiments demonstrate that our method significantly outperforms state-of-the-art models in both qualitative assessments and perceptual quantitative evaluations. Additionally, our approach supports text-guided restoration, enabling object-level colorization control that mimics the expertise of professional photo editing.

Via

Access Paper or Ask Questions

SCJD: Sparse Correlation and Joint Distillation for Efficient 3D Human Pose Estimation

Mar 18, 2025

Weihong Chen, Xuemiao Xu, Haoxin Yang, Yi Xie, Peng Xiao, Cheng Xu, Huaidong Zhang, Pheng-Ann Heng

Abstract:Existing 3D Human Pose Estimation (HPE) methods achieve high accuracy but suffer from computational overhead and slow inference, while knowledge distillation methods fail to address spatial relationships between joints and temporal correlations in multi-frame inputs. In this paper, we propose Sparse Correlation and Joint Distillation (SCJD), a novel framework that balances efficiency and accuracy for 3D HPE. SCJD introduces Sparse Correlation Input Sequence Downsampling to reduce redundancy in student network inputs while preserving inter-frame correlations. For effective knowledge transfer, we propose Dynamic Joint Spatial Attention Distillation, which includes Dynamic Joint Embedding Distillation to enhance the student's feature representation using the teacher's multi-frame context feature, and Adjacent Joint Attention Distillation to improve the student network's focus on adjacent joint relationships for better spatial understanding. Additionally, Temporal Consistency Distillation aligns the temporal correlations between teacher and student networks through upsampling and global supervision. Extensive experiments demonstrate that SCJD achieves state-of-the-art performance. Code is available at https://github.com/wileychan/SCJD.

Via

Access Paper or Ask Questions

MRJ-Agent: An Effective Jailbreak Agent for Multi-Round Dialogue

Nov 06, 2024

Fengxiang Wang, Ranjie Duan, Peng Xiao, Xiaojun Jia, YueFeng Chen, Chongwen Wang, Jialing Tao, Hang Su, Jun Zhu, Hui Xue

Figure 1 for MRJ-Agent: An Effective Jailbreak Agent for Multi-Round Dialogue

Figure 2 for MRJ-Agent: An Effective Jailbreak Agent for Multi-Round Dialogue

Figure 3 for MRJ-Agent: An Effective Jailbreak Agent for Multi-Round Dialogue

Figure 4 for MRJ-Agent: An Effective Jailbreak Agent for Multi-Round Dialogue

Abstract:Large Language Models (LLMs) demonstrate outstanding performance in their reservoir of knowledge and understanding capabilities, but they have also been shown to be prone to illegal or unethical reactions when subjected to jailbreak attacks. To ensure their responsible deployment in critical applications, it is crucial to understand the safety capabilities and vulnerabilities of LLMs. Previous works mainly focus on jailbreak in single-round dialogue, overlooking the potential jailbreak risks in multi-round dialogues, which are a vital way humans interact with and extract information from LLMs. Some studies have increasingly concentrated on the risks associated with jailbreak in multi-round dialogues. These efforts typically involve the use of manually crafted templates or prompt engineering techniques. However, due to the inherent complexity of multi-round dialogues, their jailbreak performance is limited. To solve this problem, we propose a novel multi-round dialogue jailbreaking agent, emphasizing the importance of stealthiness in identifying and mitigating potential threats to human values posed by LLMs. We propose a risk decomposition strategy that distributes risks across multiple rounds of queries and utilizes psychological strategies to enhance attack strength. Extensive experiments show that our proposed method surpasses other attack methods and achieves state-of-the-art attack success rate. We will make the corresponding code and dataset available for future research. The code will be released soon.

Via

Access Paper or Ask Questions

HR-Extreme: A High-Resolution Dataset for Extreme Weather Forecasting

Sep 27, 2024

Nian Ran, Peng Xiao, Yue Wang, Wesley Shi, Jianxin Lin, Qi Meng, Richard Allmendinger

Figure 1 for HR-Extreme: A High-Resolution Dataset for Extreme Weather Forecasting

Figure 2 for HR-Extreme: A High-Resolution Dataset for Extreme Weather Forecasting

Figure 3 for HR-Extreme: A High-Resolution Dataset for Extreme Weather Forecasting

Figure 4 for HR-Extreme: A High-Resolution Dataset for Extreme Weather Forecasting

Abstract:The application of large deep learning models in weather forecasting has led to significant advancements in the field, including higher-resolution forecasting and extended prediction periods exemplified by models such as Pangu and Fuxi. Despite these successes, previous research has largely been characterized by the neglect of extreme weather events, and the availability of datasets specifically curated for such events remains limited. Given the critical importance of accurately forecasting extreme weather, this study introduces a comprehensive dataset that incorporates high-resolution extreme weather cases derived from the High-Resolution Rapid Refresh (HRRR) data, a 3-km real-time dataset provided by NOAA. We also evaluate the current state-of-the-art deep learning models and Numerical Weather Prediction (NWP) systems on HR-Extreme, and provide a improved baseline deep learning model called HR-Heim which has superior performance on both general loss and HR-Extreme compared to others. Our results reveal that the errors of extreme weather cases are significantly larger than overall forecast error, highlighting them as an crucial source of loss in weather prediction. These findings underscore the necessity for future research to focus on improving the accuracy of extreme weather forecasts to enhance their practical utility.

* 10 pages, under review

Via

Access Paper or Ask Questions

DiffColor: Toward High Fidelity Text-Guided Image Colorization with Diffusion Models

Aug 03, 2023

Jianxin Lin, Peng Xiao, Yijun Wang, Rongju Zhang, Xiangxiang Zeng

Figure 1 for DiffColor: Toward High Fidelity Text-Guided Image Colorization with Diffusion Models

Figure 2 for DiffColor: Toward High Fidelity Text-Guided Image Colorization with Diffusion Models

Figure 3 for DiffColor: Toward High Fidelity Text-Guided Image Colorization with Diffusion Models

Figure 4 for DiffColor: Toward High Fidelity Text-Guided Image Colorization with Diffusion Models

Abstract:Recent data-driven image colorization methods have enabled automatic or reference-based colorization, while still suffering from unsatisfactory and inaccurate object-level color control. To address these issues, we propose a new method called DiffColor that leverages the power of pre-trained diffusion models to recover vivid colors conditioned on a prompt text, without any additional inputs. DiffColor mainly contains two stages: colorization with generative color prior and in-context controllable colorization. Specifically, we first fine-tune a pre-trained text-to-image model to generate colorized images using a CLIP-based contrastive loss. Then we try to obtain an optimized text embedding aligning the colorized image and the text prompt, and a fine-tuned diffusion model enabling high-quality image reconstruction. Our method can produce vivid and diverse colors with a few iterations, and keep the structure and background intact while having colors well-aligned with the target language guidance. Moreover, our method allows for in-context colorization, i.e., producing different colorization results by modifying prompt texts without any fine-tuning, and can achieve object-level controllable colorization results. Extensive experiments and user studies demonstrate that DiffColor outperforms previous works in terms of visual quality, color fidelity, and diversity of colorization options.

Via

Access Paper or Ask Questions

OMSN and FAROS: OCTA Microstructure Segmentation Network and Fully Annotated Retinal OCTA Segmentation Dataset

Dec 26, 2022

Peng Xiao, Xiaodong Hu, Ke Ma, Gengyuan Wang, Ziqing Feng, Yuancong Huang, Jin Yuan

Figure 1 for OMSN and FAROS: OCTA Microstructure Segmentation Network and Fully Annotated Retinal OCTA Segmentation Dataset

Figure 2 for OMSN and FAROS: OCTA Microstructure Segmentation Network and Fully Annotated Retinal OCTA Segmentation Dataset

Figure 3 for OMSN and FAROS: OCTA Microstructure Segmentation Network and Fully Annotated Retinal OCTA Segmentation Dataset

Figure 4 for OMSN and FAROS: OCTA Microstructure Segmentation Network and Fully Annotated Retinal OCTA Segmentation Dataset

Abstract:The lack of efficient segmentation methods and fully-labeled datasets limits the comprehensive assessment of optical coherence tomography angiography (OCTA) microstructures like retinal vessel network (RVN) and foveal avascular zone (FAZ), which are of great value in ophthalmic and systematic diseases evaluation. Here, we introduce an innovative OCTA microstructure segmentation network (OMSN) by combining an encoder-decoder-based architecture with multi-scale skip connections and the split-attention-based residual network ResNeSt, paying specific attention to OCTA microstructural features while facilitating better model convergence and feature representations. The proposed OMSN achieves excellent single/multi-task performances for RVN or/and FAZ segmentation. Especially, the evaluation metrics on multi-task models outperform single-task models on the same dataset. On this basis, a fully annotated retinal OCTA segmentation (FAROS) dataset is constructed semi-automatically, filling the vacancy of a pixel-level fully-labeled OCTA dataset. OMSN multi-task segmentation model retrained with FAROS further certifies its outstanding accuracy for simultaneous RVN and FAZ segmentation.

* 10 pages, 6 figures, submitted to IEEE Transactions on Medical Imaging (TMI)

Via

Access Paper or Ask Questions

Bayesian Federated Neural Matching that Completes Full Information

Nov 15, 2022

Peng Xiao, Samuel Cheng

Figure 1 for Bayesian Federated Neural Matching that Completes Full Information

Figure 2 for Bayesian Federated Neural Matching that Completes Full Information

Figure 3 for Bayesian Federated Neural Matching that Completes Full Information

Figure 4 for Bayesian Federated Neural Matching that Completes Full Information

Abstract:Federated learning is a contemporary machine learning paradigm where locally trained models are distilled into a global model. Due to the intrinsic permutation invariance of neural networks, Probabilistic Federated Neural Matching (PFNM) employs a Bayesian nonparametric framework in the generation process of local neurons, and then creates a linear sum assignment formulation in each alternative optimization iteration. But according to our theoretical analysis, the optimization iteration in PFNM omits global information from existing. In this study, we propose a novel approach that overcomes this flaw by introducing a Kullback-Leibler divergence penalty at each iteration. The effectiveness of our approach is demonstrated by experiments on both image classification and semantic segmentation tasks.

Via

Access Paper or Ask Questions

Investigating Neuron Disturbing in Fusing Heterogeneous Neural Networks

Oct 24, 2022

Biao Zhang, Peng Xiao, Shuqin Zhang

Figure 1 for Investigating Neuron Disturbing in Fusing Heterogeneous Neural Networks

Figure 2 for Investigating Neuron Disturbing in Fusing Heterogeneous Neural Networks

Figure 3 for Investigating Neuron Disturbing in Fusing Heterogeneous Neural Networks

Figure 4 for Investigating Neuron Disturbing in Fusing Heterogeneous Neural Networks

Abstract:Fusing deep learning models trained on separately located clients into a global model in a one-shot communication round is a straightforward implementation of Federated Learning. Although current model fusion methods are shown experimentally valid in fusing neural networks with almost identical architectures, they are rarely theoretically analyzed. In this paper, we reveal the phenomenon of neuron disturbing, where neurons from heterogeneous local models interfere with each other mutually. We give detailed explanations from a Bayesian viewpoint combining the data heterogeneity among clients and properties of neural networks. Furthermore, to validate our findings, we propose an experimental method that excludes neuron disturbing and fuses neural networks via adaptively selecting a local model, called AMS, to execute the prediction according to the input. The experiments demonstrate that AMS is more robust in data heterogeneity than general model fusion and ensemble methods. This implies the necessity of considering neural disturbing in model fusion. Besides, AMS is available for fusing models with varying architectures as an experimental algorithm, and we also list several possible extensions of AMS for future work.

* 16 pages, 3 figures

Via

Access Paper or Ask Questions

Potential Advantages of Peak Picking Multi-Voltage Threshold Digitizer in Energy Determination in Radiation Measurement

Mar 08, 2021

Kezhang Zhu, Junhua Mei, Yuming Su, Pingping Dai, Nicola D'Ascenzo, Hao Wang, Peng Xiao, Lin Wan, Qingguo Xie

Figure 1 for Potential Advantages of Peak Picking Multi-Voltage Threshold Digitizer in Energy Determination in Radiation Measurement

Figure 2 for Potential Advantages of Peak Picking Multi-Voltage Threshold Digitizer in Energy Determination in Radiation Measurement

Figure 3 for Potential Advantages of Peak Picking Multi-Voltage Threshold Digitizer in Energy Determination in Radiation Measurement

Figure 4 for Potential Advantages of Peak Picking Multi-Voltage Threshold Digitizer in Energy Determination in Radiation Measurement

Abstract:The Multi-voltage Threshold (MVT) method, which samples the signal by certain reference voltages, has been well developed as being adopted in pre-clinical and clinical digital positron emission tomography(PET) system. To improve its energy measurement performance, we propose a Peak Picking MVT(PP-MVT) Digitizer in this paper. Firstly, a sampled Peak Point(the highest point in pulse signal), which carries the values of amplitude feature voltage and amplitude arriving time, is added to traditional MVT with a simple peak sampling circuit. Secondly, an amplitude deviation statistical analysis, which compares the energy deviation of various reconstruction models, is used to select adaptive reconstruction models for signal pulses with different amplitudes. After processing 30,000 randomly-chosen pulses sampled by the oscilloscope with a 22Na point source, our method achieves an energy resolution of 17.50% within a 450-650 KeV energy window, which is 2.44% better than the result of traditional MVT with same thresholds; and we get a count number at 15225 in the same energy window while the result of MVT is at 14678. When the PP-MVT involves less thresholds than traditional MVT, the advantages of better energy resolution and larger count number can still be maintained, which shows the robustness and the flexibility of PP-MVT Digitizer. This improved method indicates that adding feature peak information could improve the performance on signal sampling and reconstruction, which canbe proved by the better performance in energy determination in radiation measurement.

* 14 pages, 8 figures, 1 table

Via

Access Paper or Ask Questions

Graph-based Matched Field Localization for an Underwater Source

Jan 18, 2021

Peng Xiao, Lingji Xu, Liya Xu, Jianmin Yang, Qing Hu

Figure 1 for Graph-based Matched Field Localization for an Underwater Source

Figure 2 for Graph-based Matched Field Localization for an Underwater Source

Figure 3 for Graph-based Matched Field Localization for an Underwater Source

Figure 4 for Graph-based Matched Field Localization for an Underwater Source

Abstract:Matched Field Processing (MFP) locates the underwater sources by matching the received data with the replica vectors, which could be regarded as a generalized beamformer. In this paper, the MFP method is combined with a recently developed framework -- Graph Signal Processing (GSP) method. Following the paradigm of GSP, a spatial adjacency matrix is constructed for the arbitrary distributed sensors based on the Green's function, then the source is located by utilizing the graph Fourier transform. The simulation results illustrate that the Graph-based MFP outperforms the the conventional MFP processors -- the Bartlett processor and the Minimum Variance processor -- for its good accuracy and robustness.

Via

Access Paper or Ask Questions