Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Cool-Fusion: Fuse Large Language Models without Training

Jul 29, 2024

Cong Liu, Xiaojun Quan, Yan Pan, Liang Lin, Weigang Wu, Xu Chen

Figure 1 for Cool-Fusion: Fuse Large Language Models without Training

Figure 2 for Cool-Fusion: Fuse Large Language Models without Training

Figure 3 for Cool-Fusion: Fuse Large Language Models without Training

Figure 4 for Cool-Fusion: Fuse Large Language Models without Training

Share this with someone who'll enjoy it:

Abstract:We focus on the problem of fusing two or more heterogeneous large language models (LLMs) to facilitate their complementary strengths. One of the challenges on model fusion is high computational load, i.e. to fine-tune or to align vocabularies via combinatorial optimization. To this end, we propose \emph{Cool-Fusion}, a simple yet effective approach that fuses the knowledge of heterogeneous source LLMs to leverage their complementary strengths. \emph{Cool-Fusion} is the first method that does not require any type of training like the ensemble approaches. But unlike ensemble methods, it is applicable to any set of source LLMs that have different vocabularies. The basic idea is to have each source LLM individually generate tokens until the tokens can be decoded into a text segment that ends at word boundaries common to all source LLMs. Then, the source LLMs jointly rerank the generated text segment and select the best one, which is the fused text generation in one step. Extensive experiments are conducted across a variety of benchmark datasets. On \emph{GSM8K}, \emph{Cool-Fusion} increases accuracy from three strong source LLMs by a significant 8\%-17.8\%.

View paper on

Share this with someone who'll enjoy it:

Title:Cool-Fusion: Fuse Large Language Models without Training

Paper and Code