Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Crayon: Customized On-Device LLM via Instant Adapter Blending and Edge-Server Hybrid Inference

Jun 11, 2024

Jihwan Bang, Juntae Lee, Kyuhong Shim, Seunghan Yang, Simyung Chang

Figure 1 for Crayon: Customized On-Device LLM via Instant Adapter Blending and Edge-Server Hybrid Inference

Figure 2 for Crayon: Customized On-Device LLM via Instant Adapter Blending and Edge-Server Hybrid Inference

Figure 3 for Crayon: Customized On-Device LLM via Instant Adapter Blending and Edge-Server Hybrid Inference

Figure 4 for Crayon: Customized On-Device LLM via Instant Adapter Blending and Edge-Server Hybrid Inference

Share this with someone who'll enjoy it:

Abstract:The customization of large language models (LLMs) for user-specified tasks gets important. However, maintaining all the customized LLMs on cloud servers incurs substantial memory and computational overheads, and uploading user data can also lead to privacy concerns. On-device LLMs can offer a promising solution by mitigating these issues. Yet, the performance of on-device LLMs is inherently constrained by the limitations of small-scaled models. To overcome these restrictions, we first propose Crayon, a novel approach for on-device LLM customization. Crayon begins by constructing a pool of diverse base adapters, and then we instantly blend them into a customized adapter without extra training. In addition, we develop a device-server hybrid inference strategy, which deftly allocates more demanding queries or non-customized tasks to a larger, more capable LLM on a server. This ensures optimal performance without sacrificing the benefits of on-device customization. We carefully craft a novel benchmark from multiple question-answer datasets, and show the efficacy of our method in the LLM customization.

* ACL 2024 Main

View paper on

Share this with someone who'll enjoy it:

Title:Crayon: Customized On-Device LLM via Instant Adapter Blending and Edge-Server Hybrid Inference

Paper and Code