Abstract:Large language models (LLMs) have transformed natural language processing, with frameworks like Chatbot Arena providing pioneering platforms for evaluating these models. By facilitating millions of pairwise comparisons based on human judgments, Chatbot Arena has become a cornerstone in LLM evaluation, offering rich datasets for ranking models in open-ended conversational tasks. Building upon this foundation, we propose a statistical framework that incorporates key advancements to address specific challenges in pairwise comparison analysis. First, we introduce a factored tie model that enhances the ability to handle ties -- an integral aspect of human-judged comparisons -- significantly improving the model's fit to observed data. Second, we extend the framework to model covariance between competitors, enabling deeper insights into performance relationships and facilitating intuitive groupings into performance tiers. Third, we resolve optimization challenges arising from parameter non-uniqueness by introducing novel constraints, ensuring stable and interpretable parameter estimation. Through rigorous evaluation and extensive experimentation, our framework demonstrates substantial improvements over existing methods in modeling pairwise comparison data. To support reproducibility and practical adoption, we release leaderbot, an open-source Python package implementing our models and analyses.
Abstract:We study a matrix that arises in a singular formulation of the Woodbury matrix identity when the Woodbury identity no longer holds. We present generalized inverse and pseudo-determinant identities for such matrix that have direct applications to the Gaussian process regression, in particular, its likelihood representation and its precision matrix. We also provide an efficient algorithm and numerical analysis for the presented determinant identities and demonstrate their advantages in certain conditions which are applicable to computing log-determinant terms in likelihood functions of Gaussian process regression.
Abstract:We develop a computational procedure to estimate the covariance hyperparameters for semiparametric Gaussian process regression models with additive noise. Namely, the presented method can be used to efficiently estimate the variance of the correlated error, and the variance of the noise based on maximizing a marginal likelihood function. Our method involves suitably reducing the dimensionality of the hyperparameter space to simplify the estimation procedure to a univariate root-finding problem. Moreover, we derive bounds and asymptotes of the marginal likelihood function and its derivatives, which are useful to narrowing the initial range of the hyperparameter search. Using numerical examples, we demonstrate the computational advantages and robustness of the presented approach compared to traditional parameter optimization.
Abstract:We develop heuristic interpolation methods for the function $t \mapsto \operatorname{trace}\left( (\mathbf{A} + t \mathbf{B})^{-1} \right)$, where the matrices $\mathbf{A}$ and $\mathbf{B}$ are symmetric and positive definite and $t$ is a real variable. This function is featured in many applications in statistics, machine learning, and computational physics. The presented interpolation functions are based on the modification of a sharp upper bound that we derive for this function, which is a new trace inequality for matrices. We demonstrate the accuracy and performance of the proposed method with numerical examples, namely, the marginal maximum likelihood estimation for linear Gaussian process regression and the estimation of the regularization parameter of ridge regression with the generalized cross-validation method.