Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ronghua Liang

Supervised makeup transfer with a curated dataset: Decoupling identity and makeup features for enhanced transformation

Jan 31, 2026

Qihe Pan, Yiming Wu, Xing Zhao, Liang Xie, Guodao Sun, Ronghua Liang

Abstract:Diffusion models have recently shown strong progress in generative tasks, offering a more stable alternative to GAN-based approaches for makeup transfer. Existing methods often suffer from limited datasets, poor disentanglement between identity and makeup features, and weak controllability. To address these issues, we make three contributions. First, we construct a curated high-quality dataset using a train-generate-filter-retrain strategy that combines synthetic, realistic, and filtered samples to improve diversity and fidelity. Second, we design a diffusion-based framework that disentangles identity and makeup features, ensuring facial structure and skin tone are preserved while applying accurate and diverse cosmetic styles. Third, we propose a text-guided mechanism that allows fine-grained and region-specific control, enabling users to modify eyes, lips, or face makeup with natural language prompts. Experiments on benchmarks and real-world scenarios demonstrate improvements in fidelity, identity preservation, and flexibility. Examples of our dataset can be found at: https://makeup-adapter.github.io.

* This paper has been accepted for publication in the proceedings of 2026 IEEE ICASSP Conference

Via

Access Paper or Ask Questions

InfoAffect: A Dataset for Affective Analysis of Infographics

Nov 09, 2025

Zihang Fu, Yunchao Wang, Chenyu Huang, Guodao Sun, Ronghua Liang

Abstract:Infographics are widely used to convey complex information, yet their affective dimensions remain underexplored due to the scarcity of data resources. We introduce a 3.5k-sample affect-annotated InfoAffect dataset, which combines textual content with real-world infographics. We first collect the raw data from six domains and aligned them via preprocessing, the accompanied-text-priority method, and three strategies to guarantee the quality and compliance. After that we construct an affect table and use it to constrain annotation. Five state-of-the-art multimodal large language models (MLLMs) then analyze both modalities, and their outputs are fused with Reciprocal Rank Fusion (RRF) algorithm to yield robust affects and confidences. We conducted a user study with two experiments to validate usability and assess InfoAffect dataset using the Composite Affect Consistency Index (CACI), achieving an overall score of 0.986, which indicates high accuracy.

Via

Access Paper or Ask Questions

DOMR: Establishing Cross-View Segmentation via Dense Object Matching

Aug 06, 2025

Jitong Liao, Yulu Gao, Shaofei Huang, Jialin Gao, Jie Lei, Ronghua Liang, Si Liu

Abstract:Cross-view object correspondence involves matching objects between egocentric (first-person) and exocentric (third-person) views. It is a critical yet challenging task for visual understanding. In this work, we propose the Dense Object Matching and Refinement (DOMR) framework to establish dense object correspondences across views. The framework centers around the Dense Object Matcher (DOM) module, which jointly models multiple objects. Unlike methods that directly match individual object masks to image features, DOM leverages both positional and semantic relationships among objects to find correspondences. DOM integrates a proposal generation module with a dense matching module that jointly encodes visual, spatial, and semantic cues, explicitly constructing inter-object relationships to achieve dense matching among objects. Furthermore, we combine DOM with a mask refinement head designed to improve the completeness and accuracy of the predicted masks, forming the complete DOMR framework. Extensive evaluations on the Ego-Exo4D benchmark demonstrate that our approach achieves state-of-the-art performance with a mean IoU of 49.7% on Ego$\to$Exo and 55.2% on Exo$\to$Ego. These results outperform those of previous methods by 5.8% and 4.3%, respectively, validating the effectiveness of our integrated approach for cross-view understanding.

* Accepted by ACM MM 2025

Via

Access Paper or Ask Questions

Towards Robust Multimodal Emotion Recognition under Missing Modalities and Distribution Shifts

Jun 12, 2025

Guowei Zhong, Ruohong Huan, Mingzhen Wu, Ronghua Liang, Peng Chen

Figure 1 for Towards Robust Multimodal Emotion Recognition under Missing Modalities and Distribution Shifts

Figure 2 for Towards Robust Multimodal Emotion Recognition under Missing Modalities and Distribution Shifts

Figure 3 for Towards Robust Multimodal Emotion Recognition under Missing Modalities and Distribution Shifts

Figure 4 for Towards Robust Multimodal Emotion Recognition under Missing Modalities and Distribution Shifts

Abstract:Recent advancements in Multimodal Emotion Recognition (MER) face challenges in addressing both modality missing and Out-Of-Distribution (OOD) data simultaneously. Existing methods often rely on specific models or introduce excessive parameters, which limits their practicality. To address these issues, we propose a novel robust MER framework, Causal Inference Distiller (CIDer), and introduce a new task, Random Modality Feature Missing (RMFM), to generalize the definition of modality missing. CIDer integrates two key components: a Model-Specific Self-Distillation (MSSD) module and a Model-Agnostic Causal Inference (MACI) module. MSSD enhances robustness under the RMFM task through a weight-sharing self-distillation approach applied across low-level features, attention maps, and high-level representations. Additionally, a Word-level Self-aligned Attention Module (WSAM) reduces computational complexity, while a Multimodal Composite Transformer (MCT) facilitates efficient multimodal fusion. To tackle OOD challenges, MACI employs a tailored causal graph to mitigate label and language biases using a Multimodal Causal Module (MCM) and fine-grained counterfactual texts. Notably, MACI can independently enhance OOD generalization with minimal additional parameters. Furthermore, we also introduce the new repartitioned MER OOD datasets. Experimental results demonstrate that CIDer achieves robust performance in both RMFM and OOD scenarios, with fewer parameters and faster training compared to state-of-the-art methods. The implementation of this work is publicly accessible at https://github.com/gw-zhong/CIDer.

* Submitted to TAC. The code is available at https://github.com/gw-zhong/CIDer

Via

Access Paper or Ask Questions

Towards Small Object Editing: A Benchmark Dataset and A Training-Free Approach

Nov 03, 2024

Qihe Pan, Zhen Zhao, Zicheng Wang, Sifan Long, Yiming Wu, Wei Ji, Haoran Liang, Ronghua Liang

Figure 1 for Towards Small Object Editing: A Benchmark Dataset and A Training-Free Approach

Figure 2 for Towards Small Object Editing: A Benchmark Dataset and A Training-Free Approach

Figure 3 for Towards Small Object Editing: A Benchmark Dataset and A Training-Free Approach

Figure 4 for Towards Small Object Editing: A Benchmark Dataset and A Training-Free Approach

Abstract:A plethora of text-guided image editing methods has recently been developed by leveraging the impressive capabilities of large-scale diffusion-based generative models especially Stable Diffusion. Despite the success of diffusion models in producing high-quality images, their application to small object generation has been limited due to difficulties in aligning cross-modal attention maps between text and these objects. Our approach offers a training-free method that significantly mitigates this alignment issue with local and global attention guidance , enhancing the model's ability to accurately render small objects in accordance with textual descriptions. We detail the methodology in our approach, emphasizing its divergence from traditional generation techniques and highlighting its advantages. What's more important is that we also provide~\textit{SOEBench} (Small Object Editing), a standardized benchmark for quantitatively evaluating text-based small object generation collected from \textit{MSCOCO} and \textit{OpenImage}. Preliminary results demonstrate the effectiveness of our method, showing marked improvements in the fidelity and accuracy of small object generation compared to existing models. This advancement not only contributes to the field of AI and computer vision but also opens up new possibilities for applications in various industries where precise image generation is critical. We will release our dataset on our project page: \href{https://soebench.github.io/}{https://soebench.github.io/}.

* 9 pages, 8 figures, Accepted by ACMMM 2024

Via

Access Paper or Ask Questions

Self-distilled Dynamic Fusion Network for Language-based Fashion Retrieval

May 24, 2024

Yiming Wu, Hangfei Li, Fangfang Wang, Yilong Zhang, Ronghua Liang

Figure 1 for Self-distilled Dynamic Fusion Network for Language-based Fashion Retrieval

Figure 2 for Self-distilled Dynamic Fusion Network for Language-based Fashion Retrieval

Figure 3 for Self-distilled Dynamic Fusion Network for Language-based Fashion Retrieval

Figure 4 for Self-distilled Dynamic Fusion Network for Language-based Fashion Retrieval

Abstract:In the domain of language-based fashion image retrieval, pinpointing the desired fashion item using both a reference image and its accompanying textual description is an intriguing challenge. Existing approaches lean heavily on static fusion techniques, intertwining image and text. Despite their commendable advancements, these approaches are still limited by a deficiency in flexibility. In response, we propose a Self-distilled Dynamic Fusion Network to compose the multi-granularity features dynamically by considering the consistency of routing path and modality-specific information simultaneously. Two new modules are included in our proposed method: (1) Dynamic Fusion Network with Modality Specific Routers. The dynamic network enables a flexible determination of the routing for each reference image and modification text, taking into account their distinct semantics and distributions. (2) Self Path Distillation Loss. A stable path decision for queries benefits the optimization of feature extraction as well as routing, and we approach this by progressively refine the path decision with previous path information. Extensive experiments demonstrate the effectiveness of our proposed model compared to existing methods.

* ICASSP 2024

Via

Access Paper or Ask Questions

SOEDiff: Efficient Distillation for Small Object Editing

May 15, 2024

Qihe Pan, Zicheng Wang, Zhen Zhao, Yiming Wu, Sifan Long, Haoran Liang, Ronghua Liang

Abstract:In this paper, we delve into a new task known as small object editing (SOE), which focuses on text-based image inpainting within a constrained, small-sized area. Despite the remarkable success have been achieved by current image inpainting approaches, their application to the SOE task generally results in failure cases such as Object Missing, Text-Image Mismatch, and Distortion. These failures stem from the limited use of small-sized objects in training datasets and the downsampling operations employed by U-Net models, which hinders accurate generation. To overcome these challenges, we introduce a novel training-based approach, SOEDiff, aimed at enhancing the capability of baseline models like StableDiffusion in editing small-sized objects while minimizing training costs. Specifically, our method involves two key components: SO-LoRA, which efficiently fine-tunes low-rank matrices, and Cross-Scale Score Distillation loss, which leverages high-resolution predictions from the pre-trained teacher diffusion model. Our method presents significant improvements on the test dataset collected from MSCOCO and OpenImage, validating the effectiveness of our proposed method in small object editing. In particular, when comparing SOEDiff with SD-I model on the OpenImage-f dataset, we observe a 0.99 improvement in CLIP-Score and a reduction of 2.87 in FID. Our project page can be found in https://soediff.github.io/.

Via

Access Paper or Ask Questions

C5: Towards Better Conversation Comprehension and Contextual Continuity for ChatGPT

Aug 10, 2023

Pan Liang, Danwei Ye, Zihao Zhu, Yunchao Wang, Wang Xia, Ronghua Liang, Guodao Sun

Figure 1 for C5: Towards Better Conversation Comprehension and Contextual Continuity for ChatGPT

Figure 2 for C5: Towards Better Conversation Comprehension and Contextual Continuity for ChatGPT

Figure 3 for C5: Towards Better Conversation Comprehension and Contextual Continuity for ChatGPT

Figure 4 for C5: Towards Better Conversation Comprehension and Contextual Continuity for ChatGPT

Abstract:Large language models (LLMs), such as ChatGPT, have demonstrated outstanding performance in various fields, particularly in natural language understanding and generation tasks. In complex application scenarios, users tend to engage in multi-turn conversations with ChatGPT to keep contextual information and obtain comprehensive responses. However, human forgetting and model contextual forgetting remain prominent issues in multi-turn conversation scenarios, which challenge the users' conversation comprehension and contextual continuity for ChatGPT. To address these challenges, we propose an interactive conversation visualization system called C5, which includes Global View, Topic View, and Context-associated Q\&A View. The Global View uses the GitLog diagram metaphor to represent the conversation structure, presenting the trend of conversation evolution and supporting the exploration of locally salient features. The Topic View is designed to display all the question and answer nodes and their relationships within a topic using the structure of a knowledge graph, thereby display the relevance and evolution of conversations. The Context-associated Q\&A View consists of three linked views, which allow users to explore individual conversations deeply while providing specific contextual information when posing questions. The usefulness and effectiveness of C5 were evaluated through a case study and a user study.

Via

Access Paper or Ask Questions

AFPN: Asymptotic Feature Pyramid Network for Object Detection

Jun 28, 2023

Guoyu Yang, Jie Lei, Zhikuan Zhu, Siyu Cheng, Zunlei Feng, Ronghua Liang

Abstract:Multi-scale features are of great importance in encoding objects with scale variance in object detection tasks. A common strategy for multi-scale feature extraction is adopting the classic top-down and bottom-up feature pyramid networks. However, these approaches suffer from the loss or degradation of feature information, impairing the fusion effect of non-adjacent levels. This paper proposes an asymptotic feature pyramid network (AFPN) to support direct interaction at non-adjacent levels. AFPN is initiated by fusing two adjacent low-level features and asymptotically incorporates higher-level features into the fusion process. In this way, the larger semantic gap between non-adjacent levels can be avoided. Given the potential for multi-object information conflicts to arise during feature fusion at each spatial location, adaptive spatial fusion operation is further utilized to mitigate these inconsistencies. We incorporate the proposed AFPN into both two-stage and one-stage object detection frameworks and evaluate with the MS-COCO 2017 validation and test datasets. Experimental evaluation shows that our method achieves more competitive results than other state-of-the-art feature pyramid networks. The code is available at \href{https://github.com/gyyang23/AFPN}{https://github.com/gyyang23/AFPN}.

Via

Access Paper or Ask Questions

Internal Structure Attention Network for Fingerprint Presentation Attack Detection from Optical Coherence Tomography

Mar 20, 2023

Haohao Sun, Yilong Zhang, Peng Chen, Haixia Wang, Ronghua Liang

Figure 1 for Internal Structure Attention Network for Fingerprint Presentation Attack Detection from Optical Coherence Tomography

Figure 2 for Internal Structure Attention Network for Fingerprint Presentation Attack Detection from Optical Coherence Tomography

Figure 3 for Internal Structure Attention Network for Fingerprint Presentation Attack Detection from Optical Coherence Tomography

Figure 4 for Internal Structure Attention Network for Fingerprint Presentation Attack Detection from Optical Coherence Tomography

Abstract:As a non-invasive optical imaging technique, optical coherence tomography (OCT) has proven promising for automatic fingerprint recognition system (AFRS) applications. Diverse approaches have been proposed for OCT-based fingerprint presentation attack detection (PAD). However, considering the complexity and variety of PA samples, it is extremely challenging to increase the generalization ability with the limited PA dataset. To solve the challenge, this paper presents a novel supervised learning-based PAD method, denoted as ISAPAD, which applies prior knowledge to guide network training and enhance the generalization ability. The proposed dual-branch architecture can not only learns global features from the OCT image, but also concentrate on layered structure feature which comes from the internal structure attention module (ISAM). The simple yet effective ISAM enables the proposed network to obtain layered segmentation features belonging only to Bonafide from noisy OCT volume data directly. Combined with effective training strategies and PAD score generation rules, ISAPAD obtains optimal PAD performance in limited training data. Domain generalization experiments and visualization analysis validate the effectiveness of the proposed method for OCT PAD.

* 12 pages, 14 figures

Via

Access Paper or Ask Questions