Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Qingcai Chen

GeRe: Towards Efficient Anti-Forgetting in Continual Learning of LLM via General Samples Replay

Aug 06, 2025

Yunan Zhang, Shuoran Jiang, Mengchen Zhao, Yuefeng Li, Yang Fan, Xiangping Wu, Qingcai Chen

Abstract:The continual learning capability of large language models (LLMs) is crucial for advancing artificial general intelligence. However, continual fine-tuning LLMs across various domains often suffers from catastrophic forgetting, characterized by: 1) significant forgetting of their general capabilities, and 2) sharp performance declines in previously learned tasks. To simultaneously address both issues in a simple yet stable manner, we propose General Sample Replay (GeRe), a framework that use usual pretraining texts for efficient anti-forgetting. Beyond revisiting the most prevalent replay-based practices under GeRe, we further leverage neural states to introduce a enhanced activation states constrained optimization method using threshold-based margin (TM) loss, which maintains activation state consistency during replay learning. We are the first to validate that a small, fixed set of pre-collected general replay samples is sufficient to resolve both concerns--retaining general capabilities while promoting overall performance across sequential tasks. Indeed, the former can inherently facilitate the latter. Through controlled experiments, we systematically compare TM with different replay strategies under the GeRe framework, including vanilla label fitting, logit imitation via KL divergence and feature imitation via L1/L2 losses. Results demonstrate that TM consistently improves performance and exhibits better robustness. Our work paves the way for efficient replay of LLMs for the future. Our code and data are available at https://github.com/Qznan/GeRe.

Via

Access Paper or Ask Questions

Document Image Rectification Bases on Self-Adaptive Multitask Fusion

May 09, 2025

Heng Li, Xiangping Wu, Qingcai Chen

Abstract:Deformed document image rectification is essential for real-world document understanding tasks, such as layout analysis and text recognition. However, current multi-task methods -- such as background removal, 3D coordinate prediction, and text line segmentation -- often overlook the complementary features between tasks and their interactions. To address this gap, we propose a self-adaptive learnable multi-task fusion rectification network named SalmRec. This network incorporates an inter-task feature aggregation module that adaptively improves the perception of geometric distortions, enhances feature complementarity, and reduces negative interference. We also introduce a gating mechanism to balance features both within global tasks and between local tasks effectively. Experimental results on two English benchmarks (DIR300 and DocUNet) and one Chinese benchmark (DocReal) demonstrate that our method significantly improves rectification performance. Ablation studies further highlight the positive impact of different tasks on dewarping and the effectiveness of our proposed module.

Via

Access Paper or Ask Questions

CrossFormer: Cross-Segment Semantic Fusion for Document Segmentation

Apr 02, 2025

Tongke Ni, Yang Fan, Junru Zhou, Xiangping Wu, Qingcai Chen

Figure 1 for CrossFormer: Cross-Segment Semantic Fusion for Document Segmentation

Figure 2 for CrossFormer: Cross-Segment Semantic Fusion for Document Segmentation

Figure 3 for CrossFormer: Cross-Segment Semantic Fusion for Document Segmentation

Figure 4 for CrossFormer: Cross-Segment Semantic Fusion for Document Segmentation

Abstract:Text semantic segmentation involves partitioning a document into multiple paragraphs with continuous semantics based on the subject matter, contextual information, and document structure. Traditional approaches have typically relied on preprocessing documents into segments to address input length constraints, resulting in the loss of critical semantic information across segments. To address this, we present CrossFormer, a transformer-based model featuring a novel cross-segment fusion module that dynamically models latent semantic dependencies across document segments, substantially elevating segmentation accuracy. Additionally, CrossFormer can replace rule-based chunk methods within the Retrieval-Augmented Generation (RAG) system, producing more semantically coherent chunks that enhance its efficacy. Comprehensive evaluations confirm CrossFormer's state-of-the-art performance on public text semantic segmentation datasets, alongside considerable gains on RAG benchmarks.

* 10 pages, 4 figures

Via

Access Paper or Ask Questions

Reasoning Graph Enhanced Exemplars Retrieval for In-Context Learning

Sep 17, 2024

Yukang Lin, Bingchen Zhong, Shuoran Jiang, Joanna Siebert, Qingcai Chen

Figure 1 for Reasoning Graph Enhanced Exemplars Retrieval for In-Context Learning

Figure 2 for Reasoning Graph Enhanced Exemplars Retrieval for In-Context Learning

Figure 3 for Reasoning Graph Enhanced Exemplars Retrieval for In-Context Learning

Figure 4 for Reasoning Graph Enhanced Exemplars Retrieval for In-Context Learning

Abstract:Large language models(LLMs) have exhibited remarkable few-shot learning capabilities and unified the paradigm of NLP tasks through the in-context learning(ICL) technique. Despite the success of ICL, the quality of the exemplar demonstrations can significantly influence the LLM's performance. Existing exemplar selection methods mainly focus on the semantic similarity between queries and candidate exemplars. On the other hand, the logical connections between reasoning steps can be beneficial to depict the problem-solving process as well. In this paper, we proposes a novel method named Reasoning Graph-enhanced Exemplar Retrieval(RGER). RGER first quires LLM to generate an initial response, then expresses intermediate problem-solving steps to a graph structure. After that, it employs graph kernel to select exemplars with semantic and structural similarity. Extensive experiments demonstrate the structural relationship is helpful to the alignment of queries and candidate exemplars. The efficacy of RGER on math and logit reasoning tasks showcases its superiority over state-of-the-art retrieval-based approaches. Our code is released at https://github.com/Yukang-Lin/RGER.

Via

Access Paper or Ask Questions

Towards Faithful Explanations for Text Classification with Robustness Improvement and Explanation Guided Training

Dec 29, 2023

Dongfang Li, Baotian Hu, Qingcai Chen, Shan He

Figure 1 for Towards Faithful Explanations for Text Classification with Robustness Improvement and Explanation Guided Training

Figure 2 for Towards Faithful Explanations for Text Classification with Robustness Improvement and Explanation Guided Training

Figure 3 for Towards Faithful Explanations for Text Classification with Robustness Improvement and Explanation Guided Training

Figure 4 for Towards Faithful Explanations for Text Classification with Robustness Improvement and Explanation Guided Training

Abstract:Feature attribution methods highlight the important input tokens as explanations to model predictions, which have been widely applied to deep neural networks towards trustworthy AI. However, recent works show that explanations provided by these methods face challenges of being faithful and robust. In this paper, we propose a method with Robustness improvement and Explanation Guided training towards more faithful EXplanations (REGEX) for text classification. First, we improve model robustness by input gradient regularization technique and virtual adversarial training. Secondly, we use salient ranking to mask noisy tokens and maximize the similarity between model attention and feature attribution, which can be seen as a self-training procedure without importing other external information. We conduct extensive experiments on six datasets with five attribution methods, and also evaluate the faithfulness in the out-of-domain setting. The results show that REGEX improves fidelity metrics of explanations in all settings and further achieves consistent gains based on two randomization tests. Moreover, we show that using highlight explanations produced by REGEX to train select-then-predict models results in comparable task performance to the end-to-end method.

Via

Access Paper or Ask Questions

ZO-AdaMU Optimizer: Adapting Perturbation by the Momentum and Uncertainty in Zeroth-order Optimization

Dec 23, 2023

Shuoran Jiang, Qingcai Chen, Youchen Pan, Yang Xiang, Yukang Lin, Xiangping Wu, Chuanyi Liu, Xiaobao Song

Abstract:Lowering the memory requirement in full-parameter training on large models has become a hot research area. MeZO fine-tunes the large language models (LLMs) by just forward passes in a zeroth-order SGD optimizer (ZO-SGD), demonstrating excellent performance with the same GPU memory usage as inference. However, the simulated perturbation stochastic approximation for gradient estimate in MeZO leads to severe oscillations and incurs a substantial time overhead. Moreover, without momentum regularization, MeZO shows severe over-fitting problems. Lastly, the perturbation-irrelevant momentum on ZO-SGD does not improve the convergence rate. This study proposes ZO-AdaMU to resolve the above problems by adapting the simulated perturbation with momentum in its stochastic approximation. Unlike existing adaptive momentum methods, we relocate momentum on simulated perturbation in stochastic gradient approximation. Our convergence analysis and experiments prove this is a better way to improve convergence stability and rate in ZO-SGD. Extensive experiments demonstrate that ZO-AdaMU yields better generalization for LLMs fine-tuning across various NLP tasks than MeZO and its momentum variants.

Via

Access Paper or Ask Questions

TDeLTA: A Light-weight and Robust Table Detection Method based on Learning Text Arrangement

Dec 18, 2023

Yang Fan, Xiangping Wu, Qingcai Chen, Heng Li, Yan Huang, Zhixiang Cai, Qitian Wu

Figure 1 for TDeLTA: A Light-weight and Robust Table Detection Method based on Learning Text Arrangement

Figure 2 for TDeLTA: A Light-weight and Robust Table Detection Method based on Learning Text Arrangement

Figure 3 for TDeLTA: A Light-weight and Robust Table Detection Method based on Learning Text Arrangement

Figure 4 for TDeLTA: A Light-weight and Robust Table Detection Method based on Learning Text Arrangement

Abstract:The diversity of tables makes table detection a great challenge, leading to existing models becoming more tedious and complex. Despite achieving high performance, they often overfit to the table style in training set, and suffer from significant performance degradation when encountering out-of-distribution tables in other domains. To tackle this problem, we start from the essence of the table, which is a set of text arranged in rows and columns. Based on this, we propose a novel, light-weighted and robust Table Detection method based on Learning Text Arrangement, namely TDeLTA. TDeLTA takes the text blocks as input, and then models the arrangement of them with a sequential encoder and an attention module. To locate the tables precisely, we design a text-classification task, classifying the text blocks into 4 categories according to their semantic roles in the tables. Experiments are conducted on both the text blocks parsed from PDF and extracted by open-source OCR tools, respectively. Compared to several state-of-the-art methods, TDeLTA achieves competitive results with only 3.1M model parameters on the large-scale public datasets. Moreover, when faced with the cross-domain data under the 0-shot setting, TDeLTA outperforms baselines by a large margin of nearly 7%, which shows the strong robustness and transferability of the proposed model.

* AAAI 2024

Via

Access Paper or Ask Questions

Confounder Balancing in Adversarial Domain Adaptation for Pre-Trained Large Models Fine-Tuning

Oct 24, 2023

Shuoran Jiang, Qingcai Chen, Yang Xiang, Youcheng Pan, Xiangping Wu

Figure 1 for Confounder Balancing in Adversarial Domain Adaptation for Pre-Trained Large Models Fine-Tuning

Figure 2 for Confounder Balancing in Adversarial Domain Adaptation for Pre-Trained Large Models Fine-Tuning

Figure 3 for Confounder Balancing in Adversarial Domain Adaptation for Pre-Trained Large Models Fine-Tuning

Figure 4 for Confounder Balancing in Adversarial Domain Adaptation for Pre-Trained Large Models Fine-Tuning

Abstract:The excellent generalization, contextual learning, and emergence abilities in the pre-trained large models (PLMs) handle specific tasks without direct training data, making them the better foundation models in the adversarial domain adaptation (ADA) methods to transfer knowledge learned from the source domain to target domains. However, existing ADA methods fail to account for the confounder properly, which is the root cause of the source data distribution that differs from the target domains. This study proposes an adversarial domain adaptation with confounder balancing for PLMs fine-tuning (ADA-CBF). The ADA-CBF includes a PLM as the foundation model for a feature extractor, a domain classifier and a confounder classifier, and they are jointly trained with an adversarial loss. This loss is designed to improve the domain-invariant representation learning by diluting the discrimination in the domain classifier. At the same time, the adversarial loss also balances the confounder distribution among source and unmeasured domains in training. Compared to existing ADA methods, ADA-CBF can correctly identify confounders in domain-invariant features, thereby eliminating the confounder biases in the extracted features from PLMs. The confounder classifier in ADA-CBF is designed as a plug-and-play and can be applied in the confounder measurable, unmeasurable, or partially measurable environments. Empirical results on natural language processing and computer vision downstream tasks show that ADA-CBF outperforms the newest GPT-4, LLaMA2, ViT and ADA methods.

Via

Access Paper or Ask Questions

Enhancing Open-Domain Table Question Answering via Syntax- and Structure-aware Dense Retrieval

Sep 19, 2023

Nengzheng Jin, Dongfang Li, Junying Chen, Joanna Siebert, Qingcai Chen

Figure 1 for Enhancing Open-Domain Table Question Answering via Syntax- and Structure-aware Dense Retrieval

Figure 2 for Enhancing Open-Domain Table Question Answering via Syntax- and Structure-aware Dense Retrieval

Figure 3 for Enhancing Open-Domain Table Question Answering via Syntax- and Structure-aware Dense Retrieval

Figure 4 for Enhancing Open-Domain Table Question Answering via Syntax- and Structure-aware Dense Retrieval

Abstract:Open-domain table question answering aims to provide answers to a question by retrieving and extracting information from a large collection of tables. Existing studies of open-domain table QA either directly adopt text retrieval methods or consider the table structure only in the encoding layer for table retrieval, which may cause syntactical and structural information loss during table scoring. To address this issue, we propose a syntax- and structure-aware retrieval method for the open-domain table QA task. It provides syntactical representations for the question and uses the structural header and value representations for the tables to avoid the loss of fine-grained syntactical and structural information. Then, a syntactical-to-structural aggregator is used to obtain the matching score between the question and a candidate table by mimicking the human retrieval process. Experimental results show that our method achieves the state-of-the-art on the NQ-tables dataset and overwhelms strong baselines on a newly curated open-domain Text-to-SQL dataset.

* IJCNLP-AACL 2023

Via

Access Paper or Ask Questions

FashionSAP: Symbols and Attributes Prompt for Fine-grained Fashion Vision-Language Pre-training

Apr 11, 2023

Yunpeng Han, Lisai Zhang, Qingcai Chen, Zhijian Chen, Zhonghua Li, Jianxin Yang, Zhao Cao

Figure 1 for FashionSAP: Symbols and Attributes Prompt for Fine-grained Fashion Vision-Language Pre-training

Figure 2 for FashionSAP: Symbols and Attributes Prompt for Fine-grained Fashion Vision-Language Pre-training

Figure 3 for FashionSAP: Symbols and Attributes Prompt for Fine-grained Fashion Vision-Language Pre-training

Figure 4 for FashionSAP: Symbols and Attributes Prompt for Fine-grained Fashion Vision-Language Pre-training

Abstract:Fashion vision-language pre-training models have shown efficacy for a wide range of downstream tasks. However, general vision-language pre-training models pay less attention to fine-grained domain features, while these features are important in distinguishing the specific domain tasks from general tasks. We propose a method for fine-grained fashion vision-language pre-training based on fashion Symbols and Attributes Prompt (FashionSAP) to model fine-grained multi-modalities fashion attributes and characteristics. Firstly, we propose the fashion symbols, a novel abstract fashion concept layer, to represent different fashion items and to generalize various kinds of fine-grained fashion features, making modelling fine-grained attributes more effective. Secondly, the attributes prompt method is proposed to make the model learn specific attributes of fashion items explicitly. We design proper prompt templates according to the format of fashion data. Comprehensive experiments are conducted on two public fashion benchmarks, i.e., FashionGen and FashionIQ, and FashionSAP gets SOTA performances for four popular fashion tasks. The ablation study also shows the proposed abstract fashion symbols, and the attribute prompt method enables the model to acquire fine-grained semantics in the fashion domain effectively. The obvious performance gains from FashionSAP provide a new baseline for future fashion task research.

Via

Access Paper or Ask Questions