Abstract:Large language models (LLMs) have achieved remarkable performance and are widely deployed in various applications, while the serving of LLM inference has raised concerns about user experience and serving throughput. Accordingly, service level objectives (SLOs) and goodput-the number of requests that meet SLOs per second-are introduced to evaluate the performance of LLM serving. However, existing metrics fail to capture the nature of user experience. We observe two ridiculous phenomena in existing metrics: 1) delaying token delivery can smooth the tail time between tokens (tail TBT) of a request and 2) dropping the request that fails to meet the SLOs midway can improve goodput. In this paper, we revisit SLO and goodput metrics in LLM serving and propose a unified metric framework smooth goodput including SLOs and goodput to reflect the nature of user experience in LLM serving. The framework can adapt to specific goals of different tasks by setting parameters. We re-evaluate the performance of different LLM serving systems under multiple workloads based on this unified framework and provide possible directions for future optimization of existing strategies. We hope that this framework can provide a unified standard for evaluating LLM serving and foster researches in the field of LLM serving optimization to move in a cohesive direction.
Abstract:Combining machine learning and formal methods (FMs) provides a possible solution to overcome the safety issue of autonomous driving (AD) vehicles. However, there are gaps to be bridged before this combination becomes practically applicable and useful. In an attempt to facilitate researchers in both FMs and AD areas, this paper proposes a framework that combines two well-known tools, namely CommonRoad and UPPAAL. On the one hand, CommonRoad can be enhanced by the rigorous semantics of models in UPPAAL, which enables a systematic and comprehensive understanding of the AD system's behaviour and thus strengthens the safety of the system. On the other hand, controllers synthesised by UPPAAL can be visualised by CommonRoad in real-world road networks, which facilitates AD vehicle designers greatly adopting formal models in system design. In this framework, we provide automatic model conversions between CommonRoad and UPPAAL. Therefore, users only need to program in Python and the framework takes care of the formal models, learning, and verification in the backend. We perform experiments to demonstrate the applicability of our framework in various AD scenarios, discuss the advantages of solving motion planning in our framework, and show the scalability limit and possible solutions.
Abstract:Bayesian optimization is a broadly applied methodology to optimize the expensive black-box function. Despite its success, it still faces the challenge from the high-dimensional search space. To alleviate this problem, we propose a novel Bayesian optimization framework (termed SILBO), which finds a low-dimensional space to perform Bayesian optimization iteratively through semi-supervised dimension reduction. SILBO incorporates both labeled points and unlabeled points acquired from the acquisition function to guide the embedding space learning. To accelerate the learning procedure, we present a randomized method for generating the projection matrix. Furthermore, to map from the low-dimensional space to the high-dimensional original space, we propose two mapping strategies: $\text{SILBO}_{FZ}$ and $\text{SILBO}_{FX}$ according to the evaluation overhead of the objective function. Experimental results on both synthetic function and hyperparameter optimization tasks demonstrate that SILBO outperforms the existing state-of-the-art high-dimensional Bayesian optimization methods.