Abstract:While fine-tuning pretrained models has become common practice, these models often underperform outside their specific domains. Recently developed model merging techniques enable the direct integration of multiple models, each fine-tuned for distinct tasks, into a single model. This strategy promotes multitasking capabilities without requiring retraining on the original datasets. However, existing methods fall short in addressing potential conflicts and complex correlations between tasks, especially in parameter-level adjustments, posing a challenge in effectively balancing parameter competition across various tasks. This paper introduces an innovative technique named PCB-Merging (Parameter Competition Balancing), a lightweight and training-free technique that adjusts the coefficients of each parameter for effective model merging. PCB-Merging employs intra-balancing to gauge parameter significance within individual tasks and inter-balancing to assess parameter similarities across different tasks. Parameters with low importance scores are dropped, and the remaining ones are rescaled to form the final merged model. We assessed our approach in diverse merging scenarios, including cross-task, cross-domain, and cross-training configurations, as well as out-of-domain generalization. The experimental results reveal that our approach achieves substantial performance enhancements across multiple modalities, domains, model sizes, number of tasks, fine-tuning forms, and large language models, outperforming existing model merging methods. The code is publicly available at: \url{https://github.com/duguodong7/pcb-merging}.
Abstract:Darwinian evolution of the biological brain is documented through multiple lines of evidence, although the modes of evolutionary changes remain unclear. Drawing inspiration from the evolved neural systems (e.g., visual cortex), deep learning models have demonstrated superior performance in visual tasks, among others. While the success of training deep neural networks has been relying on back-propagation (BP) and its variants to learn representations from data, BP does not incorporate the evolutionary processes that govern biological neural systems. This work proposes a neural network optimization framework based on evolutionary theory. Specifically, BP-trained deep neural networks for visual recognition tasks obtained from the ending epochs are considered the primordial ancestors (initial population). Subsequently, the population evolved with differential evolution. Extensive experiments are carried out to examine the relationships between Darwinian evolution and neural network optimization, including the correspondence between datasets, environment, models, and living species. The empirical results show that the proposed framework has positive impacts on the network, with reduced over-fitting and an order of magnitude lower time complexity compared to BP. Moreover, the experiments show that the proposed framework performs well on deep neural networks and big datasets.
Abstract:Drawing inspiration from the philosophy of Yi Jing, the Yin-Yang pair optimization (YYPO) algorithm has been shown to achieve competitive performance in single objective optimizations, in addition to the advantage of low time complexity when compared to other population-based meta-heuristics. Building upon a reversal concept in Yi Jing, we propose the novel Yi optimization (YI) algorithm. Specifically, we enhance the Yin-Yang pair in YYPO with a proposed Yi-point, in which we use Cauchy flight to update the solution, by implementing both the harmony and reversal concept of Yi Jing. The proposed Yi-point balances both the effort of exploration and exploitation in the optimization process. To examine YI, we use the IEEE CEC 2017 benchmarks and compare YI against the dynamical YYPO, CV1.0 optimizer, and four classical optimizers, i.e., the differential evolution, the genetic algorithm, the particle swarm optimization, and the simulated annealing. According to the experimental results, YI shows highly competitive performance while keeping the low time complexity. The results of this work have implications for enhancing a meta-heuristic optimizer using the philosophy of Yi Jing. While this work implements only certain aspects of Yi Jing, we envisage enhanced performance by incorporating other aspects.
Abstract:Fine-tuning pre-trained language models, particularly large language models, demands extensive computing resources and can result in varying performance outcomes across different domains and datasets. This paper examines the approach of integrating multiple models from diverse training scenarios into a unified model. This unified model excels across various data domains and exhibits the ability to generalize well on out-of-domain data. We propose a knowledge fusion method named Evolver, inspired by evolutionary algorithms, which does not need further training or additional training data. Specifically, our method involves aggregating the weights of different language models into a population and subsequently generating offspring models through mutation and crossover operations. These offspring models are then evaluated against their parents, allowing for the preservation of those models that show enhanced performance on development datasets. Importantly, our model evolving strategy can be seamlessly integrated with existing model merging frameworks, offering a versatile tool for model enhancement. Experimental results on mainstream language models (i.e., encoder-only, decoder-only, encoder-decoder) reveal that Evolver outperforms previous state-of-the-art models by large margins. The code is publicly available at {https://github.com/duguodong7/model-evolution}.
Abstract:Spiking neural networks (SNNs) have gained prominence for their potential in neuromorphic computing and energy-efficient artificial intelligence, yet optimizing them remains a formidable challenge for gradient-based methods due to their discrete, spike-based computation. This paper attempts to tackle the challenges by introducing Cosine Annealing Differential Evolution (CADE), designed to modulate the mutation factor (F) and crossover rate (CR) of differential evolution (DE) for the SNN model, i.e., Spiking Element Wise (SEW) ResNet. Extensive empirical evaluations were conducted to analyze CADE. CADE showed a balance in exploring and exploiting the search space, resulting in accelerated convergence and improved accuracy compared to existing gradient-based and DE-based methods. Moreover, an initialization method based on a transfer learning setting was developed, pretraining on a source dataset (i.e., CIFAR-10) and fine-tuning the target dataset (i.e., CIFAR-100), to improve population diversity. It was found to further enhance CADE for SNN. Remarkably, CADE elevates the performance of the highest accuracy SEW model by an additional 0.52 percentage points, underscoring its effectiveness in fine-tuning and enhancing SNNs. These findings emphasize the pivotal role of a scheduler for F and CR adjustment, especially for DE-based SNN. Source Code on Github: https://github.com/Tank-Jiang/CADE4SNN.
Abstract:Drawing inspiration from the philosophy of Yi Jing, Yin-Yang pair optimization (YYPO) has been shown to achieve competitive performance in single objective optimizations. Besides, it has the advantage of low time complexity when comparing to other population-based optimization. As a conceptual extension of YYPO, we proposed the novel Yi optimization (YI) algorithm as one of the best non-population-based optimizer. Incorporating both the harmony and reversal concept of Yi Jing, we replace the Yin-Yang pair with a Yi-point, in which we utilize the Levy flight to update the solution and balance both the effort of the exploration and the exploitation in the optimization process. As a conceptual prototype, we examine YI with IEEE CEC 2017 benchmark and compare its performance with a Levy flight-based optimizer CV1.0, the state-of-the-art dynamical Yin-Yang pair optimization in YYPO family and a few classical optimizers. According to the experimental results, YI shows highly competitive performance while keeping the low time complexity. Hence, the results of this work have implications for enhancing meta-heuristic optimizer using the philosophy of Yi Jing, which deserves research attention.
Abstract:Approach and landing accidents (ALAs) have resulted in a significant number of hull losses worldwide, aside from runway excursion, hard landing, landing short. Technologies (e.g., instrument landing system) and procedures (e.g., stabilized approach criteria) have been developed to reduce ALA risks. In this paper, we propose a data-driven method to learn and interpret flight's 4D approach and landing parameters to facilitate comprehensible and actionable insights of landing dynamics for aircrew and air traffic controller (ATCO) in real-time. Specifically, we develop a tunnel Gaussian process (TGP) model to gain an insight into the landing dynamics of aircraft using advanced surface movement guidance and control system (A-SMGCS) data, which then indicates the stability of flight. TGP hybridizes the strengths of sparse variational Gaussian process and polar Gaussian process to learn from a large amount of data in cylindrical coordinates. We examine TGP qualitatively and quantitatively by synthesizing two complex trajectory datasets. Empirically, TGP reconstructed the structure of the synthesized trajectories. When applied to operational A-SMGCS data, TGP provides the probabilistic description of landing dynamics and interpretable 4D tunnel views of approach and landing parameters. The 4D tunnel views can facilitate the analysis of procedure adherence and augment existing aircrew and ATCO's display during the approach and landing procedures, enabling necessary corrective actions. The proposed TGP model can also provide insights and aid the design of landing procedures in complex runway configurations such as parallel approach. Moreover, the extension of TGP model to the next generation of landing systems (e.g., GNSS landing system) is straight-forward. The interactive visualization of our findings are available at https://simkuangoh.github.io/TunnelGP/.