Abstract:In a multi-tenant large language model (LLM) serving platform hosting diverse applications, some users may submit an excessive number of requests, causing the service to become unavailable to other users and creating unfairness. Existing fairness approaches do not account for variations in token lengths across applications and multiple LLM calls, making them unsuitable for such platforms. To address the fairness challenge, this paper analyzes millions of requests from thousands of users on MS CoPilot, a real-world multi-tenant LLM platform hosted by Microsoft. Our analysis confirms the inadequacy of existing methods and guides the development of FairServe, a system that ensures fair LLM access across diverse applications. FairServe proposes application-characteristic aware request throttling coupled with a weighted service counter based scheduling technique to curb abusive behavior and ensure fairness. Our experimental results on real-world traces demonstrate FairServe's superior performance compared to the state-of-the-art method in ensuring fairness. We are actively working on deploying our system in production, expecting to benefit millions of customers world-wide.
Abstract:Bayesian Optimization (BO) is used to find the global optima of black box functions. In this work, we propose a practical BO method of function compositions where the form of the composition is known but the constituent functions are expensive to evaluate. By assuming an independent Gaussian process (GP) model for each of the constituent black-box function, we propose EI and UCB based BO algorithms and demonstrate their ability to outperform vanilla BO and the current state-of-art algorithms. We demonstrate a novel application of the proposed methods to dynamic pricing in revenue management when the underlying demand function is expensive to evaluate.
Abstract:Computer Mediated Communication (CMC) has brought about a revolution in the way the world communicates with each other. With the increasing number of people, interacting through the internet and the rise of new platforms and technologies has brought together the people from different social, cultural and geographical backgrounds to present their thoughts, ideas and opinions on topics of their interest. CMC has, in some cases, gave users more freedom to express themselves as compared to Face-to-face communication. This has also led to rise in the use of hostile and aggressive language and terminologies uninhibitedly. Since such use of language is detrimental to the discussion process and affects the audience and individuals negatively, efforts are being taken to control them. The research sees the need to understand the concept of flaming and hence attempts to classify them in order to give a better understanding of it. The classification is done on the basis of type of flame content being presented and the Style in which they are presented.