Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:SpecFuse: Ensembling Large Language Models via Next-Segment Prediction

Dec 10, 2024

Bo Lv, Chen Tang, Yanan Zhang, Xin Liu, Yue Yu, Ping Luo

Figure 1 for SpecFuse: Ensembling Large Language Models via Next-Segment Prediction

Figure 2 for SpecFuse: Ensembling Large Language Models via Next-Segment Prediction

Figure 3 for SpecFuse: Ensembling Large Language Models via Next-Segment Prediction

Figure 4 for SpecFuse: Ensembling Large Language Models via Next-Segment Prediction

Share this with someone who'll enjoy it:

Abstract:Ensembles of generative large language models (LLMs) can integrate the strengths of different LLMs to compensate for the limitations of individual models. However, recent work has focused on training an additional fusion model to combine complete responses from multiple LLMs, failing to tap into their collaborative potential to generate higher-quality responses. Moreover, as the additional fusion model is trained on a specialized dataset, these methods struggle with generalizing to open-domain queries from online users. In this paper, we propose SpecFuse, a novel ensemble framework that outputs the fused result by iteratively producing the next segment through collaboration among LLMs. This is achieved through cyclic execution of its inference and verification components. In each round, the inference component invokes each base LLM to generate candidate segments in parallel, and the verify component calls these LLMs again to predict the ranking of the segments. The top-ranked segment is then broadcast to all LLMs, encouraging them to generate higher-quality segments in the next round. This approach also allows the base LLMs to be plug-and-play, without any training or adaptation, avoiding generalization limitations. Furthermore, to conserve computational resources, we propose a model exit mechanism that dynamically excludes models exhibiting poor performance in previous rounds during each query response. In this way, it effectively reduces the number of model calls while maintaining overall performance.

* 15 pages, 5 figures

View paper on

Share this with someone who'll enjoy it:

Title:SpecFuse: Ensembling Large Language Models via Next-Segment Prediction

Paper and Code