Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Runhua Jiang

Neural Parameter Search for Slimmer Fine-Tuned Models and Better Transfer

May 24, 2025

Guodong Du, Zitao Fang, Jing Li, Junlin Li, Runhua Jiang, Shuyang Yu, Yifei Guo, Yangneng Chen, Sim Kuan Goh, Ho-Kin Tang(+3 more)

Abstract:Foundation models and their checkpoints have significantly advanced deep learning, boosting performance across various applications. However, fine-tuned models often struggle outside their specific domains and exhibit considerable redundancy. Recent studies suggest that combining a pruned fine-tuned model with the original pre-trained model can mitigate forgetting, reduce interference when merging model parameters across tasks, and improve compression efficiency. In this context, developing an effective pruning strategy for fine-tuned models is crucial. Leveraging the advantages of the task vector mechanism, we preprocess fine-tuned models by calculating the differences between them and the original model. Recognizing that different task vector subspaces contribute variably to model performance, we introduce a novel method called Neural Parameter Search (NPS-Pruning) for slimming down fine-tuned models. This method enhances pruning efficiency by searching through neural parameters of task vectors within low-rank subspaces. Our method has three key applications: enhancing knowledge transfer through pairwise model interpolation, facilitating effective knowledge fusion via model merging, and enabling the deployment of compressed models that retain near-original performance while significantly reducing storage costs. Extensive experiments across vision, NLP, and multi-modal benchmarks demonstrate the effectiveness and robustness of our approach, resulting in substantial performance gains. The code is publicly available at: https://github.com/duguodong7/NPS-Pruning.

* Accepted by ACL2025 Main

Via

Access Paper or Ask Questions

Parameter Competition Balancing for Model Merging

Oct 03, 2024

Guodong Du, Junlin Lee, Jing Li, Runhua Jiang, Yifei Guo, Shuyang Yu, Hanting Liu, Sim Kuan Goh, Ho-Kin Tang, Daojing He(+1 more)

Figure 1 for Parameter Competition Balancing for Model Merging

Figure 2 for Parameter Competition Balancing for Model Merging

Figure 3 for Parameter Competition Balancing for Model Merging

Figure 4 for Parameter Competition Balancing for Model Merging

Abstract:While fine-tuning pretrained models has become common practice, these models often underperform outside their specific domains. Recently developed model merging techniques enable the direct integration of multiple models, each fine-tuned for distinct tasks, into a single model. This strategy promotes multitasking capabilities without requiring retraining on the original datasets. However, existing methods fall short in addressing potential conflicts and complex correlations between tasks, especially in parameter-level adjustments, posing a challenge in effectively balancing parameter competition across various tasks. This paper introduces an innovative technique named PCB-Merging (Parameter Competition Balancing), a lightweight and training-free technique that adjusts the coefficients of each parameter for effective model merging. PCB-Merging employs intra-balancing to gauge parameter significance within individual tasks and inter-balancing to assess parameter similarities across different tasks. Parameters with low importance scores are dropped, and the remaining ones are rescaled to form the final merged model. We assessed our approach in diverse merging scenarios, including cross-task, cross-domain, and cross-training configurations, as well as out-of-domain generalization. The experimental results reveal that our approach achieves substantial performance enhancements across multiple modalities, domains, model sizes, number of tasks, fine-tuning forms, and large language models, outperforming existing model merging methods. The code is publicly available at: \url{https://github.com/duguodong7/pcb-merging}.

* Accepted by NeurIPS2024

Via

Access Paper or Ask Questions

Impacts of Darwinian Evolution on Pre-trained Deep Neural Networks

Aug 10, 2024

Guodong Du, Runhua Jiang, Senqiao Yang, Haoyang Li, Wei Chen, Keren Li, Sim Kuan Goh, Ho-Kin Tang

Abstract:Darwinian evolution of the biological brain is documented through multiple lines of evidence, although the modes of evolutionary changes remain unclear. Drawing inspiration from the evolved neural systems (e.g., visual cortex), deep learning models have demonstrated superior performance in visual tasks, among others. While the success of training deep neural networks has been relying on back-propagation (BP) and its variants to learn representations from data, BP does not incorporate the evolutionary processes that govern biological neural systems. This work proposes a neural network optimization framework based on evolutionary theory. Specifically, BP-trained deep neural networks for visual recognition tasks obtained from the ending epochs are considered the primordial ancestors (initial population). Subsequently, the population evolved with differential evolution. Extensive experiments are carried out to examine the relationships between Darwinian evolution and neural network optimization, including the correspondence between datasets, environment, models, and living species. The empirical results show that the proposed framework has positive impacts on the network, with reduced over-fitting and an order of magnitude lower time complexity compared to BP. Moreover, the experiments show that the proposed framework performs well on deep neural networks and big datasets.

* This work has been submitted to the IEEE for possible publication

Via

Access Paper or Ask Questions

Knowledge Fusion By Evolving Weights of Language Models

Jun 18, 2024

Guodong Du, Jing Li, Hanting Liu, Runhua Jiang, Shuyang Yu, Yifei Guo, Sim Kuan Goh, Ho-Kin Tang

Figure 1 for Knowledge Fusion By Evolving Weights of Language Models

Figure 2 for Knowledge Fusion By Evolving Weights of Language Models

Figure 3 for Knowledge Fusion By Evolving Weights of Language Models

Figure 4 for Knowledge Fusion By Evolving Weights of Language Models

Abstract:Fine-tuning pre-trained language models, particularly large language models, demands extensive computing resources and can result in varying performance outcomes across different domains and datasets. This paper examines the approach of integrating multiple models from diverse training scenarios into a unified model. This unified model excels across various data domains and exhibits the ability to generalize well on out-of-domain data. We propose a knowledge fusion method named Evolver, inspired by evolutionary algorithms, which does not need further training or additional training data. Specifically, our method involves aggregating the weights of different language models into a population and subsequently generating offspring models through mutation and crossover operations. These offspring models are then evaluated against their parents, allowing for the preservation of those models that show enhanced performance on development datasets. Importantly, our model evolving strategy can be seamlessly integrated with existing model merging frameworks, offering a versatile tool for model enhancement. Experimental results on mainstream language models (i.e., encoder-only, decoder-only, encoder-decoder) reveal that Evolver outperforms previous state-of-the-art models by large margins. The code is publicly available at {https://github.com/duguodong7/model-evolution}.

* Accepted by ACL2024 Findings

Via

Access Paper or Ask Questions

CADE: Cosine Annealing Differential Evolution for Spiking Neural Network

Jun 04, 2024

Runhua Jiang, Guodong Du, Shuyang Yu, Yifei Guo, Sim Kuan Goh, Ho-Kin Tang

Figure 1 for CADE: Cosine Annealing Differential Evolution for Spiking Neural Network

Figure 2 for CADE: Cosine Annealing Differential Evolution for Spiking Neural Network

Figure 3 for CADE: Cosine Annealing Differential Evolution for Spiking Neural Network

Figure 4 for CADE: Cosine Annealing Differential Evolution for Spiking Neural Network

Abstract:Spiking neural networks (SNNs) have gained prominence for their potential in neuromorphic computing and energy-efficient artificial intelligence, yet optimizing them remains a formidable challenge for gradient-based methods due to their discrete, spike-based computation. This paper attempts to tackle the challenges by introducing Cosine Annealing Differential Evolution (CADE), designed to modulate the mutation factor (F) and crossover rate (CR) of differential evolution (DE) for the SNN model, i.e., Spiking Element Wise (SEW) ResNet. Extensive empirical evaluations were conducted to analyze CADE. CADE showed a balance in exploring and exploiting the search space, resulting in accelerated convergence and improved accuracy compared to existing gradient-based and DE-based methods. Moreover, an initialization method based on a transfer learning setting was developed, pretraining on a source dataset (i.e., CIFAR-10) and fine-tuning the target dataset (i.e., CIFAR-100), to improve population diversity. It was found to further enhance CADE for SNN. Remarkably, CADE elevates the performance of the highest accuracy SEW model by an additional 0.52 percentage points, underscoring its effectiveness in fine-tuning and enhancing SNNs. These findings emphasize the pivotal role of a scheduler for F and CR adjustment, especially for DE-based SNN. Source Code on Github: https://github.com/Tank-Jiang/CADE4SNN.

Via

Access Paper or Ask Questions

Generalizing to Out-of-Sample Degradations via Model Reprogramming

Mar 09, 2024

Runhua Jiang, Yahong Han

Figure 1 for Generalizing to Out-of-Sample Degradations via Model Reprogramming

Figure 2 for Generalizing to Out-of-Sample Degradations via Model Reprogramming

Figure 3 for Generalizing to Out-of-Sample Degradations via Model Reprogramming

Figure 4 for Generalizing to Out-of-Sample Degradations via Model Reprogramming

Abstract:Existing image restoration models are typically designed for specific tasks and struggle to generalize to out-of-sample degradations not encountered during training. While zero-shot methods can address this limitation by fine-tuning model parameters on testing samples, their effectiveness relies on predefined natural priors and physical models of specific degradations. Nevertheless, determining out-of-sample degradations faced in real-world scenarios is always impractical. As a result, it is more desirable to train restoration models with inherent generalization ability. To this end, this work introduces the Out-of-Sample Restoration (OSR) task, which aims to develop restoration models capable of handling out-of-sample degradations. An intuitive solution involves pre-translating out-of-sample degradations to known degradations of restoration models. However, directly translating them in the image space could lead to complex image translation issues. To address this issue, we propose a model reprogramming framework, which translates out-of-sample degradations by quantum mechanic and wave functions. Specifically, input images are decoupled as wave functions of amplitude and phase terms. The translation of out-of-sample degradation is performed by adapting the phase term. Meanwhile, the image content is maintained and enhanced in the amplitude term. By taking these two terms as inputs, restoration models are able to handle out-of-sample degradations without fine-tuning. Through extensive experiments across multiple evaluation cases, we demonstrate the effectiveness and flexibility of our proposed framework. Our codes are available at \href{https://github.com/ddghjikle/Out-of-sample-restoration}{Github}.

Via

Access Paper or Ask Questions