Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xuefeng Zhang

HD-PiSSA: High-Rank Distributed Orthogonal Adaptation

May 24, 2025

Yiding Wang, Fauxu meng, Xuefeng Zhang, Fan Jiang, Pingzhi Tang, Muhan Zhang

Abstract:Existing parameter-efficient fine-tuning (PEFT) methods for large language models (LLMs), such as LoRA and PiSSA, constrain model updates to low-rank subspaces, limiting their expressiveness and leading to suboptimal performance on complex tasks. To address this, we introduce High-rank Distributed PiSSA (HD-PiSSA), a distributed PEFT approach that initializes orthogonal adapters across different devices and aggregates their delta updates collectively on W for fine-tuning. Unlike Data Parallel LoRA or PiSSA, which maintain identical adapters across all devices, HD-PiSSA assigns different principal components of the pre-trained weights to each GPU, significantly expanding the range of update directions. This results in over 16x higher effective updated ranks than data-parallel LoRA or PiSSA when fine-tuning on 8 GPUs with the same per-device adapter rank. Empirically, we evaluate HD-PiSSA across various challenging downstream tasks, including mathematics, code generation, and multi-task learning. In the multi-task setting, HD-PiSSA achieves average gains of 10.0 absolute points (14.63%) over LoRA and 4.98 points (6.60%) over PiSSA across 12 benchmarks, demonstrating its benefits from the extra optimization flexibility.

Via

Access Paper or Ask Questions

A Graph-based Verification Framework for Fact-Checking

Mar 10, 2025

Yani Huang, Richong Zhang, Zhijie Nie, Junfan Chen, Xuefeng Zhang

Abstract:Fact-checking plays a crucial role in combating misinformation. Existing methods using large language models (LLMs) for claim decomposition face two key limitations: (1) insufficient decomposition, introducing unnecessary complexity to the verification process, and (2) ambiguity of mentions, leading to incorrect verification results. To address these challenges, we suggest introducing a claim graph consisting of triplets to address the insufficient decomposition problem and reduce mention ambiguity through graph structure. Based on this core idea, we propose a graph-based framework, GraphFC, for fact-checking. The framework features three key components: graph construction, which builds both claim and evidence graphs; graph-guided planning, which prioritizes the triplet verification order; and graph-guided checking, which verifies the triples one by one between claim and evidence graphs. Extensive experiments show that GraphFC enables fine-grained decomposition while resolving referential ambiguities through relational constraints, achieving state-of-the-art performance across three datasets.

* 13pages, 4figures

Via

Access Paper or Ask Questions

Generative Regression Based Watch Time Prediction for Video Recommendation: Model and Performance

Dec 28, 2024

Hongxu Ma, Kai Tian, Tao Zhang, Xuefeng Zhang, Chunjie Chen, Han Li, Jihong Guan, Shuigeng Zhou

Figure 1 for Generative Regression Based Watch Time Prediction for Video Recommendation: Model and Performance

Figure 2 for Generative Regression Based Watch Time Prediction for Video Recommendation: Model and Performance

Figure 3 for Generative Regression Based Watch Time Prediction for Video Recommendation: Model and Performance

Figure 4 for Generative Regression Based Watch Time Prediction for Video Recommendation: Model and Performance

Abstract:Watch time prediction (WTP) has emerged as a pivotal task in short video recommendation systems, designed to encapsulate user interests. Predicting users' watch times on videos often encounters challenges, including wide value ranges and imbalanced data distributions, which can lead to significant bias when directly regressing watch time. Recent studies have tried to tackle these issues by converting the continuous watch time estimation into an ordinal classification task. While these methods are somewhat effective, they exhibit notable limitations. Inspired by language modeling, we propose a novel Generative Regression (GR) paradigm for WTP based on sequence generation. This approach employs structural discretization to enable the lossless reconstruction of original values while maintaining prediction fidelity. By formulating the prediction problem as a numerical-to-sequence mapping, and with meticulously designed vocabulary and label encodings, each watch time is transformed into a sequence of tokens. To expedite model training, we introduce the curriculum learning with an embedding mixup strategy which can mitigate training-and-inference inconsistency associated with teacher forcing. We evaluate our method against state-of-the-art approaches on four public datasets and one industrial dataset. We also perform online A/B testing on Kuaishou, a leading video app with about 400 million DAUs, to demonstrate the real-world efficacy of our method. The results conclusively show that GR outperforms existing techniques significantly. Furthermore, we successfully apply GR to another regression task in recommendation systems, i.e., Lifetime Value (LTV) prediction, which highlights its potential as a novel and effective solution to general regression challenges.

* 10 pages, 5 figures, conference or other essential info

Via

Access Paper or Ask Questions

SEMDR: A Semantic-Aware Dual Encoder Model for Legal Judgment Prediction with Legal Clue Tracing

Aug 19, 2024

Pengjie Liu, Wang Zhang, Yulong Ding, Xuefeng Zhang, Shuang-Hua Yang

Figure 1 for SEMDR: A Semantic-Aware Dual Encoder Model for Legal Judgment Prediction with Legal Clue Tracing

Figure 2 for SEMDR: A Semantic-Aware Dual Encoder Model for Legal Judgment Prediction with Legal Clue Tracing

Figure 3 for SEMDR: A Semantic-Aware Dual Encoder Model for Legal Judgment Prediction with Legal Clue Tracing

Figure 4 for SEMDR: A Semantic-Aware Dual Encoder Model for Legal Judgment Prediction with Legal Clue Tracing

Abstract:Legal Judgment Prediction (LJP) aims to form legal judgments based on the criminal fact description. However, researchers struggle to classify confusing criminal cases, such as robbery and theft, which requires LJP models to distinguish the nuances between similar crimes. Existing methods usually design handcrafted features to pick up necessary semantic legal clues to make more accurate legal judgment predictions. In this paper, we propose a Semantic-Aware Dual Encoder Model (SEMDR), which designs a novel legal clue tracing mechanism to conduct fine-grained semantic reasoning between criminal facts and instruments. Our legal clue tracing mechanism is built from three reasoning levels: 1) Lexicon-Tracing, which aims to extract criminal facts from criminal descriptions; 2) Sentence Representation Learning, which contrastively trains language models to better represent confusing criminal facts; 3) Multi-Fact Reasoning, which builds a reasons graph to propagate semantic clues among fact nodes to capture the subtle difference among criminal facts. Our legal clue tracing mechanism helps SEMDR achieve state-of-the-art on the CAIL2018 dataset and shows its advance in few-shot scenarios. Our experiments show that SEMDR has a strong ability to learn more uniform and distinguished representations for criminal facts, which helps to make more accurate predictions on confusing criminal cases and reduces the model uncertainty during making judgments. All codes will be released via GitHub.

Via

Access Paper or Ask Questions

Progressively Modality Freezing for Multi-Modal Entity Alignment

Jul 23, 2024

Yani Huang, Xuefeng Zhang, Richong Zhang, Junfan Chen, Jaein Kim

Figure 1 for Progressively Modality Freezing for Multi-Modal Entity Alignment

Figure 2 for Progressively Modality Freezing for Multi-Modal Entity Alignment

Figure 3 for Progressively Modality Freezing for Multi-Modal Entity Alignment

Figure 4 for Progressively Modality Freezing for Multi-Modal Entity Alignment

Abstract:Multi-Modal Entity Alignment aims to discover identical entities across heterogeneous knowledge graphs. While recent studies have delved into fusion paradigms to represent entities holistically, the elimination of features irrelevant to alignment and modal inconsistencies is overlooked, which are caused by inherent differences in multi-modal features. To address these challenges, we propose a novel strategy of progressive modality freezing, called PMF, that focuses on alignmentrelevant features and enhances multi-modal feature fusion. Notably, our approach introduces a pioneering cross-modal association loss to foster modal consistency. Empirical evaluations across nine datasets confirm PMF's superiority, demonstrating stateof-the-art performance and the rationale for freezing modalities. Our code is available at https://github.com/ninibymilk/PMF-MMEA.

* 13pages, 8 figures, Accepted by ACL2024

Via

Access Paper or Ask Questions