Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yulong Zhang

Come Together, But Not Right Now: A Progressive Strategy to Boost Low-Rank Adaptation

Jun 06, 2025

Zhan Zhuang, Xiequn Wang, Wei Li, Yulong Zhang, Qiushi Huang, Shuhao Chen, Xuehao Wang, Yanbin Wei, Yuhe Nie, Kede Ma(+2 more)

Abstract:Low-rank adaptation (LoRA) has emerged as a leading parameter-efficient fine-tuning technique for adapting large foundation models, yet it often locks adapters into suboptimal minima near their initialization. This hampers model generalization and limits downstream operators such as adapter merging and pruning. Here, we propose CoTo, a progressive training strategy that gradually increases adapters' activation probability over the course of fine-tuning. By stochastically deactivating adapters, CoTo encourages more balanced optimization and broader exploration of the loss landscape. We provide a theoretical analysis showing that CoTo promotes layer-wise dropout stability and linear mode connectivity, and we adopt a cooperative-game approach to quantify each adapter's marginal contribution. Extensive experiments demonstrate that CoTo consistently boosts single-task performance, enhances multi-task merging accuracy, improves pruning robustness, and reduces training overhead, all while remaining compatible with diverse LoRA variants. Code is available at https://github.com/zwebzone/coto.

* Accepted by ICML 2025. Code link: https://github.com/zwebzone/coto

Via

Access Paper or Ask Questions

Open the Eyes of MPNN: Vision Enhances MPNN in Link Prediction

May 13, 2025

Yanbin Wei, Xuehao Wang, Zhan Zhuang, Yang Chen, Shuhao Chen, Yulong Zhang, Yu Zhang, James Kwok

Abstract:Message-passing graph neural networks (MPNNs) and structural features (SFs) are cornerstones for the link prediction task. However, as a common and intuitive mode of understanding, the potential of visual perception has been overlooked in the MPNN community. For the first time, we equip MPNNs with vision structural awareness by proposing an effective framework called Graph Vision Network (GVN), along with a more efficient variant (E-GVN). Extensive empirical results demonstrate that with the proposed frameworks, GVN consistently benefits from the vision enhancement across seven link prediction datasets, including challenging large-scale graphs. Such improvements are compatible with existing state-of-the-art (SOTA) methods and GVNs achieve new SOTA results, thereby underscoring a promising novel direction for link prediction.

* ICML 2025

Via

Access Paper or Ask Questions

Consensus Entropy: Harnessing Multi-VLM Agreement for Self-Verifying and Self-Improving OCR

Apr 16, 2025

Yulong Zhang, Tianyi Liang, Xinyue Huang, Erfei Cui, Xu Guo, Pei Chu, Chenhui Li, Ru Zhang, Wenhai Wang, Gongshen Liu

Abstract:The Optical Character Recognition (OCR) task is important for evaluating Vision-Language Models (VLMs) and providing high-quality data sources for LLM training data. While state-of-the-art VLMs show improved average OCR accuracy, they still struggle with sample-level quality degradation and lack reliable automatic detection of low-quality outputs. We introduce Consensus Entropy (CE), a training-free post-inference method that quantifies OCR uncertainty by aggregating outputs from multiple VLMs. Our approach exploits a key insight: correct VLM OCR predictions converge in output space while errors diverge. We develop a lightweight multi-model framework that effectively identifies problematic samples, selects the best outputs and combines model strengths. Experiments across multiple OCR benchmarks and VLMs demonstrate that CE outperforms VLM-as-judge approaches and single-model baselines at the same cost and achieves state-of-the-art results across multiple metrics. For instance, our solution demonstrates: achieving 15.2% higher F1 scores than VLM-as-judge methods in quality verification, delivering 6.0% accuracy gains on mathematical calculation tasks, and requiring rephrasing only 7.3% of inputs while maintaining overall performance. Notably, the entire process requires neither training nor supervision while maintaining plug-and-play functionality throughout.

Via

Access Paper or Ask Questions

A Dual-Agent Adversarial Framework for Robust Generalization in Deep Reinforcement Learning

Jan 29, 2025

Zhengpeng Xie, Jiahang Cao, Yulong Zhang, Qiang Zhang, Renjing Xu

Figure 1 for A Dual-Agent Adversarial Framework for Robust Generalization in Deep Reinforcement Learning

Figure 2 for A Dual-Agent Adversarial Framework for Robust Generalization in Deep Reinforcement Learning

Figure 3 for A Dual-Agent Adversarial Framework for Robust Generalization in Deep Reinforcement Learning

Figure 4 for A Dual-Agent Adversarial Framework for Robust Generalization in Deep Reinforcement Learning

Abstract:Recently, empowered with the powerful capabilities of neural networks, reinforcement learning (RL) has successfully tackled numerous challenging tasks. However, while these models demonstrate enhanced decision-making abilities, they are increasingly prone to overfitting. For instance, a trained RL model often fails to generalize to even minor variations of the same task, such as a change in background color or other minor semantic differences. To address this issue, we propose a dual-agent adversarial policy learning framework, which allows agents to spontaneously learn the underlying semantics without introducing any human prior knowledge. Specifically, our framework involves a game process between two agents: each agent seeks to maximize the impact of perturbing on the opponent's policy by producing representation differences for the same state, while maintaining its own stability against such perturbations. This interaction encourages agents to learn generalizable policies, capable of handling irrelevant features from the high-dimensional observations. Extensive experimental results on the Procgen benchmark demonstrate that the adversarial process significantly improves the generalization performance of both agents, while also being applied to various RL algorithms, e.g., Proximal Policy Optimization (PPO). With the adversarial framework, the RL agent outperforms the baseline methods by a significant margin, especially in hard-level tasks, marking a significant step forward in the generalization capabilities of deep reinforcement learning.

Via

Access Paper or Ask Questions

CopRA: A Progressive LoRA Training Strategy

Oct 30, 2024

Zhan Zhuang, Xiequn Wang, Yulong Zhang, Wei Li, Yu Zhang, Ying Wei

Figure 1 for CopRA: A Progressive LoRA Training Strategy

Figure 2 for CopRA: A Progressive LoRA Training Strategy

Figure 3 for CopRA: A Progressive LoRA Training Strategy

Figure 4 for CopRA: A Progressive LoRA Training Strategy

Abstract:Low-Rank Adaptation (LoRA) is a parameter-efficient technique for rapidly fine-tuning foundation models. In standard LoRA training dynamics, models tend to quickly converge to a local optimum near the initialization. However, this local optimum may not be ideal for out-of-distribution data or tasks such as merging and pruning. In this work, we propose a novel progressive training strategy for LoRA with random layer dropping. This strategy also optimizes the Shapley value of LoRA parameters in each layer, treating each layer as a player in a cooperative game. We refer to this method as Cooperative LoRA (CopRA). Our experimental results demonstrate that parameters trained with CopRA exhibit linear mode connectivity, which enables efficient model merging. This also paves the way for federated learning and multi-task learning via LoRA merging. Additionally, by optimizing the Shapley value, CopRA shows superior performance in pruning tasks.

* Published in UniReps Workshop (Extended Abstract Track), NeurIPS 2024

Via

Access Paper or Ask Questions

Rethinking Guidance Information to Utilize Unlabeled Samples:A Label Encoding Perspective

Jun 05, 2024

Yulong Zhang, Yuan Yao, Shuhao Chen, Pengrong Jin, Yu Zhang, Jian Jin, Jiangang Lu

Figure 1 for Rethinking Guidance Information to Utilize Unlabeled Samples:A Label Encoding Perspective

Figure 2 for Rethinking Guidance Information to Utilize Unlabeled Samples:A Label Encoding Perspective

Figure 3 for Rethinking Guidance Information to Utilize Unlabeled Samples:A Label Encoding Perspective

Figure 4 for Rethinking Guidance Information to Utilize Unlabeled Samples:A Label Encoding Perspective

Abstract:Empirical Risk Minimization (ERM) is fragile in scenarios with insufficient labeled samples. A vanilla extension of ERM to unlabeled samples is Entropy Minimization (EntMin), which employs the soft-labels of unlabeled samples to guide their learning. However, EntMin emphasizes prediction discriminability while neglecting prediction diversity. To alleviate this issue, in this paper, we rethink the guidance information to utilize unlabeled samples. By analyzing the learning objective of ERM, we find that the guidance information for labeled samples in a specific category is the corresponding label encoding. Inspired by this finding, we propose a Label-Encoding Risk Minimization (LERM). It first estimates the label encodings through prediction means of unlabeled samples and then aligns them with their corresponding ground-truth label encodings. As a result, the LERM ensures both prediction discriminability and diversity, and it can be integrated into existing methods as a plugin. Theoretically, we analyze the relationships between LERM and ERM as well as EntMin. Empirically, we verify the superiority of the LERM under several label insufficient scenarios. The codes are available at https://github.com/zhangyl660/LERM.

* Accepted to ICML 2024

Via

Access Paper or Ask Questions

Large Language Models as Visual Cross-Domain Learners

Jan 06, 2024

Shuhao Chen, Yulong Zhang, Weisen Jiang, Jiangang Lu, Yu Zhang

Figure 1 for Large Language Models as Visual Cross-Domain Learners

Figure 2 for Large Language Models as Visual Cross-Domain Learners

Figure 3 for Large Language Models as Visual Cross-Domain Learners

Figure 4 for Large Language Models as Visual Cross-Domain Learners

Abstract:Recent advances achieved by deep learning models rely on the independent and identically distributed assumption, hindering their applications in real-world scenarios with domain shifts. To address the above issues, cross-domain learning aims at extracting domain-invariant knowledge to reduce the domain shift between training and testing data. However, in visual cross-domain learning, traditional methods concentrate solely on the image modality, neglecting the use of the text modality to alleviate the domain shift. In this work, we propose Large Language models as Visual cross-dOmain learners (LLaVO). LLaVO uses vision-language models to convert images into detailed textual descriptions. A large language model is then finetuned on textual descriptions of the source/target domain generated by a designed instruction template. Extensive experimental results on various cross-domain tasks under the domain generalization and unsupervised domain adaptation settings have demonstrated the effectiveness of the proposed method.

Via

Access Paper or Ask Questions

Domain-Guided Conditional Diffusion Model for Unsupervised Domain Adaptation

Sep 23, 2023

Yulong Zhang, Shuhao Chen, Weisen Jiang, Yu Zhang, Jiangang Lu, James T. Kwok

Abstract:Limited transferability hinders the performance of deep learning models when applied to new application scenarios. Recently, Unsupervised Domain Adaptation (UDA) has achieved significant progress in addressing this issue via learning domain-invariant features. However, the performance of existing UDA methods is constrained by the large domain shift and limited target domain data. To alleviate these issues, we propose DomAin-guided Conditional Diffusion Model (DACDM) to generate high-fidelity and diversity samples for the target domain. In the proposed DACDM, by introducing class information, the labels of generated samples can be controlled, and a domain classifier is further introduced in DACDM to guide the generated samples for the target domain. The generated samples help existing UDA methods transfer from the source domain to the target domain more easily, thus improving the transfer performance. Extensive experiments on various benchmarks demonstrate that DACDM brings a large improvement to the performance of existing UDA methods.

* Work in progress

Via

Access Paper or Ask Questions

Diffusion-based Target Sampler for Unsupervised Domain Adaptation

Mar 17, 2023

Yulong Zhang, Shuhao Chen, Yu Zhang, Jiangang Lu

Abstract:Limited transferability hinders the performance of deep learning models when applied to new application scenarios. Recently, unsupervised domain adaptation (UDA) has achieved significant progress in addressing this issue via learning domain-invariant features. However, large domain shifts and the sample scarcity in the target domain make existing UDA methods achieve suboptimal performance. To alleviate these issues, we propose a plug-and-play Diffusion-based Target Sampler (DTS) to generate high fidelity and diversity pseudo target samples. By introducing class-conditional information, the labels of the generated target samples can be controlled. The generated samples can well simulate the data distribution of the target domain and help existing UDA methods transfer from the source domain to the target domain more easily, thus improving the transfer performance. Extensive experiments on various benchmarks demonstrate that the performance of existing UDA methods can be greatly improved through the proposed DTS method.

Via

Access Paper or Ask Questions

Noise-robust blind reverberation time estimation using noise-aware time-frequency masking

Dec 09, 2021

Kaitong Zheng, Chengshi Zheng, Jinqiu Sang, Yulong Zhang, Xiaodong Li

Figure 1 for Noise-robust blind reverberation time estimation using noise-aware time-frequency masking

Figure 2 for Noise-robust blind reverberation time estimation using noise-aware time-frequency masking

Figure 3 for Noise-robust blind reverberation time estimation using noise-aware time-frequency masking

Figure 4 for Noise-robust blind reverberation time estimation using noise-aware time-frequency masking

Abstract:The reverberation time is one of the most important parameters used to characterize the acoustic property of an enclosure. In real-world scenarios, it is much more convenient to estimate the reverberation time blindly from recorded speech compared to the traditional acoustic measurement techniques using professional measurement instruments. However, the recorded speech is often corrupted by noise, which has a detrimental effect on the estimation accuracy of the reverberation time. To address this issue, this paper proposes a two-stage blind reverberation time estimation method based on noise-aware time-frequency masking. This proposed method has a good ability to distinguish the reverberation tails from the noise, thus improving the estimation accuracy of reverberation time in noisy scenarios. The simulated and real-world acoustic experimental results show that the proposed method significantly outperforms other methods in challenging scenarios.

Via

Access Paper or Ask Questions