Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yuhang Yao

Toward Super Agent System with Hybrid AI Routers

Apr 11, 2025

Yuhang Yao, Haixin Wang, Yibo Chen, Jiawen Wang, Min Chang Jordan Ren, Bosheng Ding, Salman Avestimehr, Chaoyang He

Abstract:AI Agents powered by Large Language Models are transforming the world through enormous applications. A super agent has the potential to fulfill diverse user needs, such as summarization, coding, and research, by accurately understanding user intent and leveraging the appropriate tools to solve tasks. However, to make such an agent viable for real-world deployment and accessible at scale, significant optimizations are required to ensure high efficiency and low cost. This paper presents a design of the Super Agent System. Upon receiving a user prompt, the system first detects the intent of the user, then routes the request to specialized task agents with the necessary tools or automatically generates agentic workflows. In practice, most applications directly serve as AI assistants on edge devices such as phones and robots. As different language models vary in capability and cloud-based models often entail high computational costs, latency, and privacy concerns, we then explore the hybrid mode where the router dynamically selects between local and cloud models based on task complexity. Finally, we introduce the blueprint of an on-device super agent enhanced with cloud. With advances in multi-modality models and edge hardware, we envision that most computations can be handled locally, with cloud collaboration only as needed. Such architecture paves the way for super agents to be seamlessly integrated into everyday life in the near future.

Via

Access Paper or Ask Questions

FedGAT: A Privacy-Preserving Federated Approximation Algorithm for Graph Attention Networks

Dec 20, 2024

Siddharth Ambekar, Yuhang Yao, Ryan Li, Carlee Joe-Wong

Figure 1 for FedGAT: A Privacy-Preserving Federated Approximation Algorithm for Graph Attention Networks

Figure 2 for FedGAT: A Privacy-Preserving Federated Approximation Algorithm for Graph Attention Networks

Figure 3 for FedGAT: A Privacy-Preserving Federated Approximation Algorithm for Graph Attention Networks

Figure 4 for FedGAT: A Privacy-Preserving Federated Approximation Algorithm for Graph Attention Networks

Abstract:Federated training methods have gained popularity for graph learning with applications including friendship graphs of social media sites and customer-merchant interaction graphs of huge online marketplaces. However, privacy regulations often require locally generated data to be stored on local clients. The graph is then naturally partitioned across clients, with no client permitted access to information stored on another. Cross-client edges arise naturally in such cases and present an interesting challenge to federated training methods, as training a graph model at one client requires feature information of nodes on the other end of cross-client edges. Attempting to retain such edges often incurs significant communication overhead, and dropping them altogether reduces model performance. In simpler models such as Graph Convolutional Networks, this can be fixed by communicating a limited amount of feature information across clients before training, but GATs (Graph Attention Networks) require additional information that cannot be pre-communicated, as it changes from training round to round. We introduce the Federated Graph Attention Network (FedGAT) algorithm for semi-supervised node classification, which approximates the behavior of GATs with provable bounds on the approximation error. FedGAT requires only one pre-training communication round, significantly reducing the communication overhead for federated GAT training. We then analyze the error in the approximation and examine the communication overhead and computational complexity of the algorithm. Experiments show that FedGAT achieves nearly the same accuracy as a GAT model in a centralised setting, and its performance is robust to the number of clients as well as data distribution.

Via

Access Paper or Ask Questions

Alopex: A Computational Framework for Enabling On-Device Function Calls with LLMs

Nov 07, 2024

Yide Ran, Zhaozhuo Xu, Yuhang Yao, Zijian Hu, Shanshan Han, Han Jin, Alay Dilipbhai Shah, Jipeng Zhang, Dimitris Stripelis, Tong Zhang(+2 more)

Figure 1 for Alopex: A Computational Framework for Enabling On-Device Function Calls with LLMs

Figure 2 for Alopex: A Computational Framework for Enabling On-Device Function Calls with LLMs

Figure 3 for Alopex: A Computational Framework for Enabling On-Device Function Calls with LLMs

Figure 4 for Alopex: A Computational Framework for Enabling On-Device Function Calls with LLMs

Abstract:The rapid advancement of Large Language Models (LLMs) has led to their increased integration into mobile devices for personalized assistance, which enables LLMs to call external API functions to enhance their performance. However, challenges such as data scarcity, ineffective question formatting, and catastrophic forgetting hinder the development of on-device LLM agents. To tackle these issues, we propose Alopex, a framework that enables precise on-device function calls using the Fox LLM. Alopex introduces a logic-based method for generating high-quality training data and a novel ``description-question-output'' format for fine-tuning, reducing risks of function information leakage. Additionally, a data mixing strategy is used to mitigate catastrophic forgetting, combining function call data with textbook datasets to enhance performance in various tasks. Experimental results show that Alopex improves function call accuracy and significantly reduces catastrophic forgetting, providing a robust solution for integrating function call capabilities into LLMs without manual intervention.

Via

Access Paper or Ask Questions

FedGraph: A Research Library and Benchmark for Federated Graph Learning

Oct 08, 2024

Yuhang Yao, Yuan Li, Xinyi Fan, Junhao Li, Kay Liu, Weizhao Jin, Srivatsan Ravi, Philip S. Yu, Carlee Joe-Wong

Figure 1 for FedGraph: A Research Library and Benchmark for Federated Graph Learning

Figure 2 for FedGraph: A Research Library and Benchmark for Federated Graph Learning

Figure 3 for FedGraph: A Research Library and Benchmark for Federated Graph Learning

Figure 4 for FedGraph: A Research Library and Benchmark for Federated Graph Learning

Abstract:Federated graph learning is an emerging field with significant practical challenges. While many algorithms have been proposed to enhance model accuracy, their system performance, crucial for real-world deployment, is often overlooked. To address this gap, we present FedGraph, a research library designed for practical distributed deployment and benchmarking in federated graph learning. FedGraph supports a range of state-of-the-art methods and includes profiling tools for system performance evaluation, focusing on communication and computation costs during training. FedGraph can then facilitate the development of practical applications and guide the design of future algorithms.

* https://github.com/FedGraph/fedgraph

Via

Access Paper or Ask Questions

PolyRouter: A Multi-LLM Querying System

Aug 26, 2024

Dimitris Stripelis, Zijian Hu, Jipeng Zhang, Zhaozhuo Xu, Alay Dilipbhai Shah, Han Jin, Yuhang Yao, Salman Avestimehr, Chaoyang He

Figure 1 for PolyRouter: A Multi-LLM Querying System

Figure 2 for PolyRouter: A Multi-LLM Querying System

Figure 3 for PolyRouter: A Multi-LLM Querying System

Figure 4 for PolyRouter: A Multi-LLM Querying System

Abstract:With the rapid growth of Large Language Models (LLMs) across various domains, numerous new LLMs have emerged, each possessing domain-specific expertise. This proliferation has highlighted the need for quick, high-quality, and cost-effective LLM query response methods. Yet, no single LLM exists to efficiently balance this trilemma. Some models are powerful but extremely costly, while others are fast and inexpensive but qualitatively inferior. To address this challenge, we present PolyRouter, a non-monolithic LLM querying system that seamlessly integrates various LLM experts into a single query interface and dynamically routes incoming queries to the most high-performant expert based on query's requirements. Through extensive experiments, we demonstrate that when compared to standalone expert models, PolyRouter improves query efficiency by up to 40%, and leads to significant cost reductions of up to 30%, while maintaining or enhancing model performance by up to 10%.

* 14 pages, 7 figures, 2 tables

Via

Access Paper or Ask Questions

ScaleLLM: A Resource-Frugal LLM Serving Framework by Optimizing End-to-End Efficiency

Jul 23, 2024

Yuhang Yao, Han Jin, Alay Dilipbhai Shah, Shanshan Han, Zijian Hu, Yide Ran, Dimitris Stripelis, Zhaozhuo Xu, Salman Avestimehr, Chaoyang He

Abstract:Large language models (LLMs) have surged in popularity and are extensively used in commercial applications, where the efficiency of model serving is crucial for the user experience. Most current research focuses on optimizing individual sub-procedures, e.g. local inference and communication, however, there is no comprehensive framework that provides a holistic system view for optimizing LLM serving in an end-to-end manner. In this work, we conduct a detailed analysis to identify major bottlenecks that impact end-to-end latency in LLM serving systems. Our analysis reveals that a comprehensive LLM serving endpoint must address a series of efficiency bottlenecks that extend beyond LLM inference. We then propose ScaleLLM, an optimized system for resource-efficient LLM serving. Our extensive experiments reveal that with 64 concurrent requests, ScaleLLM achieves a 4.3x speed up over vLLM and outperforms state-of-the-arts with 1.5x higher throughput.

Via

Access Paper or Ask Questions

TorchOpera: A Compound AI System for LLM Safety

Jun 16, 2024

Shanshan Han, Yuhang Yao, Zijian Hu, Dimitris Stripelis, Zhaozhuo Xu, Chaoyang He

Abstract:We introduce TorchOpera, a compound AI system for enhancing the safety and quality of prompts and responses for Large Language Models. TorchOpera ensures that all user prompts are safe, contextually grounded, and effectively processed, while enhancing LLM responses to be relevant and high quality. TorchOpera utilizes the vector database for contextual grounding, rule-based wrappers for flexible modifications, and specialized mechanisms for detecting and adjusting unsafe or incorrect content. We also provide a view of the compound AI system to reduce the computational cost. Extensive experiments show that TorchOpera ensures the safety, reliability, and applicability of LLMs in real-world settings while maintaining the efficiency of LLM responses.

Via

Access Paper or Ask Questions

LLM Multi-Agent Systems: Challenges and Open Problems

Feb 05, 2024

Shanshan Han, Qifan Zhang, Yuhang Yao, Weizhao Jin, Zhaozhuo Xu, Chaoyang He

Abstract:This paper explores existing works of multi-agent systems and identifies challenges that remain inadequately addressed. By leveraging the diverse capabilities and roles of individual agents within a multi-agent system, these systems can tackle complex tasks through collaboration. We discuss optimizing task allocation, fostering robust reasoning through iterative debates, managing complex and layered context information, and enhancing memory management to support the intricate interactions within multi-agent systems. We also explore the potential application of multi-agent systems in blockchain systems to shed light on their future development and application in real-world distributed systems.

Via

Access Paper or Ask Questions

Kick Bad Guys Out! Zero-Knowledge-Proof-Based Anomaly Detection in Federated Learning

Oct 06, 2023

Shanshan Han, Wenxuan Wu, Baturalp Buyukates, Weizhao Jin, Yuhang Yao, Qifan Zhang, Salman Avestimehr, Chaoyang He

Figure 1 for Kick Bad Guys Out! Zero-Knowledge-Proof-Based Anomaly Detection in Federated Learning

Figure 2 for Kick Bad Guys Out! Zero-Knowledge-Proof-Based Anomaly Detection in Federated Learning

Figure 3 for Kick Bad Guys Out! Zero-Knowledge-Proof-Based Anomaly Detection in Federated Learning

Figure 4 for Kick Bad Guys Out! Zero-Knowledge-Proof-Based Anomaly Detection in Federated Learning

Abstract:Federated learning (FL) systems are vulnerable to malicious clients that submit poisoned local models to achieve their adversarial goals, such as preventing the convergence of the global model or inducing the global model to misclassify some data. Many existing defense mechanisms are impractical in real-world FL systems, as they require prior knowledge of the number of malicious clients or rely on re-weighting or modifying submissions. This is because adversaries typically do not announce their intentions before attacking, and re-weighting might change aggregation results even in the absence of attacks. To address these challenges in real FL systems, this paper introduces a cutting-edge anomaly detection approach with the following features: i) Detecting the occurrence of attacks and performing defense operations only when attacks happen; ii) Upon the occurrence of an attack, further detecting the malicious client models and eliminating them without harming the benign ones; iii) Ensuring honest execution of defense mechanisms at the server by leveraging a zero-knowledge proof mechanism. We validate the superior performance of the proposed approach with extensive experiments.

Via

Access Paper or Ask Questions

FedML-HE: An Efficient Homomorphic-Encryption-Based Privacy-Preserving Federated Learning System

Mar 20, 2023

Weizhao Jin, Yuhang Yao, Shanshan Han, Carlee Joe-Wong, Srivatsan Ravi, Salman Avestimehr, Chaoyang He

Figure 1 for FedML-HE: An Efficient Homomorphic-Encryption-Based Privacy-Preserving Federated Learning System

Figure 2 for FedML-HE: An Efficient Homomorphic-Encryption-Based Privacy-Preserving Federated Learning System

Figure 3 for FedML-HE: An Efficient Homomorphic-Encryption-Based Privacy-Preserving Federated Learning System

Figure 4 for FedML-HE: An Efficient Homomorphic-Encryption-Based Privacy-Preserving Federated Learning System

Abstract:Federated Learning (FL) enables machine learning model training on distributed edge devices by aggregating local model updates rather than local data. However, privacy concerns arise as the FL server's access to local model updates can potentially reveal sensitive personal information by performing attacks like gradient inversion recovery. To address these concerns, privacy-preserving methods, such as Homomorphic Encryption (HE)-based approaches, have been proposed. Despite HE's post-quantum security advantages, its applications suffer from impractical overheads. In this paper, we present FedML-HE, the first practical system for efficient HE-based secure federated aggregation that provides a user/device-friendly deployment platform. FL-HE utilizes a novel universal overhead optimization scheme, significantly reducing both computation and communication overheads during deployment while providing customizable privacy guarantees. Our optimized system demonstrates considerable overhead reduction, particularly for large models (e.g., ~10x reduction for HE-federated training of ResNet-50 and ~40x reduction for BERT), demonstrating the potential for scalable HE-based FL deployment.

Via

Access Paper or Ask Questions