Abstract:Large Language Models (LLMs) have revolutionized intelligent services by enabling logical reasoning, tool use, and interaction with external systems as agents. The advancement of LLMs is frequently hindered by the scarcity of high-quality data, much of which is inherently sensitive. Federated learning (FL) offers a potential solution by facilitating the collaborative training of distributed LLMs while safeguarding private data. However, FL frameworks face significant bandwidth and computational demands, along with challenges from heterogeneous data distributions. The emerging in-context learning capability of LLMs offers a promising approach by aggregating natural language rather than bulky model parameters. Yet, this method risks privacy leakage, as it necessitates the collection and presentation of data samples from various clients during aggregation. In this paper, we propose a novel privacy-preserving Federated In-Context LLM Agent Learning (FICAL) algorithm, which to our best knowledge for the first work unleashes the power of in-context learning to train diverse LLM agents through FL. In our design, knowledge compendiums generated by a novel LLM-enhanced Knowledge Compendiums Generation (KCG) module are transmitted between clients and the server instead of model parameters in previous FL methods. Apart from that, an incredible Retrieval Augmented Generation (RAG) based Tool Learning and Utilizing (TLU) module is designed and we incorporate the aggregated global knowledge compendium as a teacher to teach LLM agents the usage of tools. We conducted extensive experiments and the results show that FICAL has competitive performance compared to other SOTA baselines with a significant communication cost decrease of $\mathbf{3.33\times10^5}$ times.
Abstract:The next generation of communication is envisioned to be intelligent communication, that can replace traditional symbolic communication, where highly condensed semantic information considering both source and channel will be extracted and transmitted with high efficiency. The recent popular large models such as GPT4 and the boosting learning techniques lay a solid foundation for the intelligent communication, and prompt the practical deployment of it in the near future. Given the characteristics of "training once and widely use" of those multimodal large language models, we argue that a pay-as-you-go service mode will be suitable in this context, referred to as Large Model as a Service (LMaaS). However, the trading and pricing problem is quite complex with heterogeneous and dynamic customer environments, making the pricing optimization problem challenging in seeking on-hand solutions. In this paper, we aim to fill this gap and formulate the LMaaS market trading as a Stackelberg game with two steps. In the first step, we optimize the seller's pricing decision and propose an Iterative Model Pricing (IMP) algorithm that optimizes the prices of large models iteratively by reasoning customers' future rental decisions, which is able to achieve a near-optimal pricing solution. In the second step, we optimize customers' selection decisions by designing a robust selecting and renting (RSR) algorithm, which is guaranteed to be optimal with rigorous theoretical proof. Extensive experiments confirm the effectiveness and robustness of our algorithms.
Abstract:Foundation models have shown great success in natural language processing, computer vision, and multimodal tasks. FMs have a large number of model parameters, thus requiring a substantial amount of data to help optimize the model during the training. Federated learning has revolutionized machine learning by enabling collaborative learning from decentralized data while still preserving the data privacy of clients. Despite the great benefits foundation models can have empowered by federated learning, they face severe computation, communication, and statistical challenges. In this paper, we propose a novel two-stage federated learning algorithm called FedMS. A global expert is trained in the first stage and a local expert is trained in the second stage to provide better personalization. We construct a Mixture of Foundation Models (MoFM) with these two experts and design a gate neural network with an inserted gate adapter that joins the aggregation every communication round in the second stage. To further adapt to edge computing scenarios with limited computational resources, we design a novel Sparsely Activated LoRA (SAL) algorithm that freezes the pre-trained foundation model parameters inserts low-rank adaptation matrices into transformer blocks and activates them progressively during the training. We employ extensive experiments to verify the effectiveness of FedMS, results show that FedMS outperforms other SOTA baselines by up to 55.25% in default settings.