Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:A Unified Approach to Routing and Cascading for LLMs

Oct 14, 2024

Jasper Dekoninck, Maximilian Baader, Martin Vechev

Figure 1 for A Unified Approach to Routing and Cascading for LLMs

Figure 2 for A Unified Approach to Routing and Cascading for LLMs

Figure 3 for A Unified Approach to Routing and Cascading for LLMs

Figure 4 for A Unified Approach to Routing and Cascading for LLMs

Share this with someone who'll enjoy it:

Abstract:The widespread applicability of large language models (LLMs) has increased the availability of many fine-tuned models of various sizes targeting specific tasks. Given a set of such specialized models, to maximize overall performance, it is important to figure out the optimal strategy for selecting the right model for a given user query. An effective strategy could drastically increase overall performance and even offer improvements over a single large monolithic model. Existing approaches typically fall into two categories: routing, where a single model is selected for each query, and cascading, which runs a sequence of increasingly larger models until a satisfactory answer is obtained. However, both have notable limitations: routing commits to an initial model without flexibility, while cascading requires executing every model in sequence, which can be inefficient. Additionally, the conditions under which these strategies are provably optimal remain unclear. In this work, we derive optimal strategies for both routing and cascading. Building on this analysis, we propose a novel approach called cascade routing, which combines the adaptability of routing with the cost-efficiency of cascading. Our experiments demonstrate that cascade routing consistently outperforms both routing and cascading across a variety of settings, improving both output quality and lowering computational cost, thus offering a unified and efficient solution to the model selection problem.

View paper on

Share this with someone who'll enjoy it:

Title:A Unified Approach to Routing and Cascading for LLMs

Paper and Code