Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xueming Qian

A Multi-Expert Structural-Semantic Hybrid Framework for Unveiling Historical Patterns in Temporal Knowledge Graphs

Jun 17, 2025

Yimin Deng, Yuxia Wu, Yejing Wang, Guoshuai Zhao, Li Zhu, Qidong Liu, Derong Xu, Zichuan Fu, Xian Wu, Yefeng Zheng(+2 more)

Abstract:Temporal knowledge graph reasoning aims to predict future events with knowledge of existing facts and plays a key role in various downstream tasks. Previous methods focused on either graph structure learning or semantic reasoning, failing to integrate dual reasoning perspectives to handle different prediction scenarios. Moreover, they lack the capability to capture the inherent differences between historical and non-historical events, which limits their generalization across different temporal contexts. To this end, we propose a Multi-Expert Structural-Semantic Hybrid (MESH) framework that employs three kinds of expert modules to integrate both structural and semantic information, guiding the reasoning process for different events. Extensive experiments on three datasets demonstrate the effectiveness of our approach.

* ACL25 findings

Via

Access Paper or Ask Questions

TSAL: Few-shot Text Segmentation Based on Attribute Learning

Apr 15, 2025

Chenming Li, Chengxu Liu, Yuanting Fan, Xiao Jin, Xingsong Hou, Xueming Qian

Abstract:Recently supervised learning rapidly develops in scene text segmentation. However, the lack of high-quality datasets and the high cost of pixel annotation greatly limit the development of them. Considering the well-performed few-shot learning methods for downstream tasks, we investigate the application of the few-shot learning method to scene text segmentation. We propose TSAL, which leverages CLIP's prior knowledge to learn text attributes for segmentation. To fully utilize the semantic and texture information in the image, a visual-guided branch is proposed to separately extract text and background features. To reduce data dependency and improve text detection accuracy, the adaptive prompt-guided branch employs effective adaptive prompt templates to capture various text attributes. To enable adaptive prompts capture distinctive text features and complex background distribution, we propose Adaptive Feature Alignment module(AFA). By aligning learnable tokens of different attributes with visual features and prompt prototypes, AFA enables adaptive prompts to capture both general and distinctive attribute information. TSAL can capture the unique attributes of text and achieve precise segmentation using only few images. Experiments demonstrate that our method achieves SOTA performance on multiple text segmentation datasets under few-shot settings and show great potential in text-related domains.

Via

Access Paper or Ask Questions

QueryCDR: Query-based Controllable Distortion Rectification Network for Fisheye Images

Dec 18, 2024

Pengbo Guo, Chengxu Liu, Xingsong Hou, Xueming Qian

Abstract:Fisheye image rectification aims to correct distortions in images taken with fisheye cameras. Although current models show promising results on images with a similar degree of distortion as the training data, they will produce sub-optimal results when the degree of distortion changes and without retraining. The lack of generalization ability for dealing with varying degrees of distortion limits their practical application. In this paper, we take one step further to enable effective distortion rectification for images with varying degrees of distortion without retraining. We propose a novel Query-based Controllable Distortion Rectification network for fisheye images (QueryCDR). In particular, we first present the Distortion-aware Learnable Query Mechanism (DLQM), which defines the latent spatial relationships for different distortion degrees as a series of learnable queries. Each query can be learned to obtain position-dependent rectification control conditions, providing control over the rectification process. Then, we propose two kinds of controllable modulating blocks to enable the control conditions to guide the modulation of the distortion features better. These core components cooperate with each other to effectively boost the generalization ability of the model at varying degrees of distortion. Extensive experiments on fisheye image datasets with different distortion degrees demonstrate our approach achieves high-quality and controllable distortion rectification.

* ECCV2024

Via

Access Paper or Ask Questions

Neighborhood Commonality-aware Evolution Network for Continuous Generalized Category Discovery

Dec 07, 2024

Ye Wang, Yaxiong Wang, Guoshuai Zhao, Xueming Qian

Figure 1 for Neighborhood Commonality-aware Evolution Network for Continuous Generalized Category Discovery

Figure 2 for Neighborhood Commonality-aware Evolution Network for Continuous Generalized Category Discovery

Figure 3 for Neighborhood Commonality-aware Evolution Network for Continuous Generalized Category Discovery

Figure 4 for Neighborhood Commonality-aware Evolution Network for Continuous Generalized Category Discovery

Abstract:Continuous Generalized Category Discovery (C-GCD) aims to continually discover novel classes from unlabelled image sets while maintaining performance on old classes. In this paper, we propose a novel learning framework, dubbed Neighborhood Commonality-aware Evolution Network (NCENet) that conquers this task from the perspective of representation learning. Concretely, to learn discriminative representations for novel classes, a Neighborhood Commonality-aware Representation Learning (NCRL) is designed, which exploits local commonalities derived neighborhoods to guide the learning of representational differences between instances of different classes. To maintain the representation ability for old classes, a Bi-level Contrastive Knowledge Distillation (BCKD) module is designed, which leverages contrastive learning to perceive the learning and learned knowledge and conducts knowledge distillation. Extensive experiments conducted on CIFAR10, CIFAR100, and Tiny-ImageNet demonstrate the superior performance of NCENet compared to the previous state-of-the-art method. Particularly, in the last incremental learning session on CIFAR100, the clustering accuracy of NCENet outperforms the second-best method by a margin of 3.09\% on old classes and by a margin of 6.32\% on new classes. Our code will be publicly available at \href{https://github.com/xjtuYW/NCENet.git}{https://github.com/xjtuYW/NCENet.git}. \end{abstract}

* 12 pages, 7 Figures

Via

Access Paper or Ask Questions

Pseudo-Label Enhanced Prototypical Contrastive Learning for Uniformed Intent Discovery

Oct 26, 2024

Yimin Deng, Yuxia Wu, Guoshuai Zhao, Li Zhu, Xueming Qian

Figure 1 for Pseudo-Label Enhanced Prototypical Contrastive Learning for Uniformed Intent Discovery

Figure 2 for Pseudo-Label Enhanced Prototypical Contrastive Learning for Uniformed Intent Discovery

Figure 3 for Pseudo-Label Enhanced Prototypical Contrastive Learning for Uniformed Intent Discovery

Figure 4 for Pseudo-Label Enhanced Prototypical Contrastive Learning for Uniformed Intent Discovery

Abstract:New intent discovery is a crucial capability for task-oriented dialogue systems. Existing methods focus on transferring in-domain (IND) prior knowledge to out-of-domain (OOD) data through pre-training and clustering stages. They either handle the two processes in a pipeline manner, which exhibits a gap between intent representation and clustering process or use typical contrastive clustering that overlooks the potential supervised signals from the whole data. Besides, they often individually deal with open intent discovery or OOD settings. To this end, we propose a Pseudo-Label enhanced Prototypical Contrastive Learning (PLPCL) model for uniformed intent discovery. We iteratively utilize pseudo-labels to explore potential positive/negative samples for contrastive learning and bridge the gap between representation and clustering. To enable better knowledge transfer, we design a prototype learning method integrating the supervised and pseudo signals from IND and OOD samples. In addition, our method has been proven effective in two different settings of discovering new intents. Experiments on three benchmark datasets and two task settings demonstrate the effectiveness of our approach.

* Accepted by EMNLP 2024 Findings

Via

Access Paper or Ask Questions

AdaDiffSR: Adaptive Region-aware Dynamic Acceleration Diffusion Model for Real-World Image Super-Resolution

Oct 23, 2024

Yuanting Fan, Chengxu Liu, Nengzhong Yin, Changlong Gao, Xueming Qian

Figure 1 for AdaDiffSR: Adaptive Region-aware Dynamic Acceleration Diffusion Model for Real-World Image Super-Resolution

Figure 2 for AdaDiffSR: Adaptive Region-aware Dynamic Acceleration Diffusion Model for Real-World Image Super-Resolution

Figure 3 for AdaDiffSR: Adaptive Region-aware Dynamic Acceleration Diffusion Model for Real-World Image Super-Resolution

Figure 4 for AdaDiffSR: Adaptive Region-aware Dynamic Acceleration Diffusion Model for Real-World Image Super-Resolution

Abstract:Diffusion models (DMs) have shown promising results on single-image super-resolution and other image-to-image translation tasks. Benefiting from more computational resources and longer inference times, they are able to yield more realistic images. Existing DMs-based super-resolution methods try to achieve an overall average recovery over all regions via iterative refinement, ignoring the consideration that different input image regions require different timesteps to reconstruct. In this work, we notice that previous DMs-based super-resolution methods suffer from wasting computational resources to reconstruct invisible details. To further improve the utilization of computational resources, we propose AdaDiffSR, a DMs-based SR pipeline with dynamic timesteps sampling strategy (DTSS). Specifically, by introducing the multi-metrics latent entropy module (MMLE), we can achieve dynamic perception of the latent spatial information gain during the denoising process, thereby guiding the dynamic selection of the timesteps. In addition, we adopt a progressive feature injection module (PFJ), which dynamically injects the original image features into the denoising process based on the current information gain, so as to generate images with both fidelity and realism. Experiments show that our AdaDiffSR achieves comparable performance over current state-of-the-art DMs-based SR methods while consuming less computational resources and inference time on both synthetic and real-world datasets.

* 18 pages, 6 figures, ECCV2024 accepted

Via

Access Paper or Ask Questions

Knowledge Adaptation Network for Few-Shot Class-Incremental Learning

Sep 18, 2024

Ye Wang, Yaxiong Wang, Guoshuai Zhao, Xueming Qian

Figure 1 for Knowledge Adaptation Network for Few-Shot Class-Incremental Learning

Figure 2 for Knowledge Adaptation Network for Few-Shot Class-Incremental Learning

Figure 3 for Knowledge Adaptation Network for Few-Shot Class-Incremental Learning

Figure 4 for Knowledge Adaptation Network for Few-Shot Class-Incremental Learning

Abstract:Few-shot class-incremental learning (FSCIL) aims to incrementally recognize new classes using a few samples while maintaining the performance on previously learned classes. One of the effective methods to solve this challenge is to construct prototypical evolution classifiers. Despite the advancement achieved by most existing methods, the classifier weights are simply initialized using mean features. Because representations for new classes are weak and biased, we argue such a strategy is suboptimal. In this paper, we tackle this issue from two aspects. Firstly, thanks to the development of foundation models, we employ a foundation model, the CLIP, as the network pedestal to provide a general representation for each class. Secondly, to generate a more reliable and comprehensive instance representation, we propose a Knowledge Adapter (KA) module that summarizes the data-specific knowledge from training data and fuses it into the general representation. Additionally, to tune the knowledge learned from the base classes to the upcoming classes, we propose a mechanism of Incremental Pseudo Episode Learning (IPEL) by simulating the actual FSCIL. Taken together, our proposed method, dubbed as Knowledge Adaptation Network (KANet), achieves competitive performance on a wide range of datasets, including CIFAR100, CUB200, and ImageNet-R.

* 13 pages;6 figures

Via

Access Paper or Ask Questions

CaLa: Complementary Association Learning for Augmenting Composed Image Retrieval

May 29, 2024

Xintong Jiang, Yaxiong Wang, Mengjian Li, Yujiao Wu, Bingwen Hu, Xueming Qian

Figure 1 for CaLa: Complementary Association Learning for Augmenting Composed Image Retrieval

Figure 2 for CaLa: Complementary Association Learning for Augmenting Composed Image Retrieval

Figure 3 for CaLa: Complementary Association Learning for Augmenting Composed Image Retrieval

Figure 4 for CaLa: Complementary Association Learning for Augmenting Composed Image Retrieval

Abstract:Composed Image Retrieval (CIR) involves searching for target images based on an image-text pair query. While current methods treat this as a query-target matching problem, we argue that CIR triplets contain additional associations beyond this primary relation. In our paper, we identify two new relations within triplets, treating each triplet as a graph node. Firstly, we introduce the concept of text-bridged image alignment, where the query text serves as a bridge between the query image and the target image. We propose a hinge-based cross-attention mechanism to incorporate this relation into network learning. Secondly, we explore complementary text reasoning, considering CIR as a form of cross-modal retrieval where two images compose to reason about complementary text. To integrate these perspectives effectively, we design a twin attention-based compositor. By combining these complementary associations with the explicit query pair-target image relation, we establish a comprehensive set of constraints for CIR. Our framework, CaLa (Complementary Association Learning for Augmenting Composed Image Retrieval), leverages these insights. We evaluate CaLa on CIRR and FashionIQ benchmarks with multiple backbones, demonstrating its superiority in composed image retrieval.

* arXiv admin note: text overlap with arXiv:2309.02169

Via

Access Paper or Ask Questions

Motion-adaptive Separable Collaborative Filters for Blind Motion Deblurring

Apr 19, 2024

Chengxu Liu, Xuan Wang, Xiangyu Xu, Ruhao Tian, Shuai Li, Xueming Qian, Ming-Hsuan Yang

Figure 1 for Motion-adaptive Separable Collaborative Filters for Blind Motion Deblurring

Figure 2 for Motion-adaptive Separable Collaborative Filters for Blind Motion Deblurring

Figure 3 for Motion-adaptive Separable Collaborative Filters for Blind Motion Deblurring

Figure 4 for Motion-adaptive Separable Collaborative Filters for Blind Motion Deblurring

Abstract:Eliminating image blur produced by various kinds of motion has been a challenging problem. Dominant approaches rely heavily on model capacity to remove blurring by reconstructing residual from blurry observation in feature space. These practices not only prevent the capture of spatially variable motion in the real world but also ignore the tailored handling of various motions in image space. In this paper, we propose a novel real-world deblurring filtering model called the Motion-adaptive Separable Collaborative (MISC) Filter. In particular, we use a motion estimation network to capture motion information from neighborhoods, thereby adaptively estimating spatially-variant motion flow, mask, kernels, weights, and offsets to obtain the MISC Filter. The MISC Filter first aligns the motion-induced blurring patterns to the motion middle along the predicted flow direction, and then collaboratively filters the aligned image through the predicted kernels, weights, and offsets to generate the output. This design can handle more generalized and complex motion in a spatially differentiated manner. Furthermore, we analyze the relationships between the motion estimation network and the residual reconstruction network. Extensive experiments on four widely used benchmarks demonstrate that our method provides an effective solution for real-world motion blur removal and achieves state-of-the-art performance. Code is available at https://github.com/ChengxuLiu/MISCFilter

* CVPR 2024

Via

Access Paper or Ask Questions

Beyond Known Clusters: Probe New Prototypes for Efficient Generalized Class Discovery

Apr 13, 2024

Ye Wang, Yaxiong Wang, Yujiao Wu, Bingchen Zhao, Xueming Qian

Figure 1 for Beyond Known Clusters: Probe New Prototypes for Efficient Generalized Class Discovery

Figure 2 for Beyond Known Clusters: Probe New Prototypes for Efficient Generalized Class Discovery

Figure 3 for Beyond Known Clusters: Probe New Prototypes for Efficient Generalized Class Discovery

Figure 4 for Beyond Known Clusters: Probe New Prototypes for Efficient Generalized Class Discovery

Abstract:Generalized Class Discovery (GCD) aims to dynamically assign labels to unlabelled data partially based on knowledge learned from labelled data, where the unlabelled data may come from known or novel classes. The prevailing approach generally involves clustering across all data and learning conceptions by prototypical contrastive learning. However, existing methods largely hinge on the performance of clustering algorithms and are thus subject to their inherent limitations. Firstly, the estimated cluster number is often smaller than the ground truth, making the existing methods suffer from the lack of prototypes for comprehensive conception learning. To address this issue, we propose an adaptive probing mechanism that introduces learnable potential prototypes to expand cluster prototypes (centers). As there is no ground truth for the potential prototype, we develop a self-supervised prototype learning framework to optimize the potential prototype in an end-to-end fashion. Secondly, clustering is computationally intensive, and the conventional strategy of clustering both labelled and unlabelled instances exacerbates this issue. To counteract this inefficiency, we opt to cluster only the unlabelled instances and subsequently expand the cluster prototypes with our introduced potential prototypes to fast explore novel classes. Despite the simplicity of our proposed method, extensive empirical analysis on a wide range of datasets confirms that our method consistently delivers state-of-the-art results. Specifically, our method surpasses the nearest competitor by a significant margin of \textbf{9.7}$\%$ within the Stanford Cars dataset and \textbf{12$\times$} clustering efficiency within the Herbarium 19 dataset. We will make the code and checkpoints publicly available at \url{https://github.com/xjtuYW/PNP.git}.

* 9 pages, 7 figures

Via

Access Paper or Ask Questions