Abstract:Evaluating performance across optimization algorithms on many problems presents a complex challenge due to the diversity of numerical scales involved. Traditional data processing methods, such as hypothesis testing and Bayesian inference, often employ ranking-based methods to normalize performance values across these varying scales. However, a significant issue emerges with this ranking-based approach: the introduction of new algorithms can potentially disrupt the original rankings. This paper extensively explores the problem, making a compelling case to underscore the issue and conducting a thorough analysis of its root causes. These efforts pave the way for a comprehensive examination of potential solutions. Building on this research, this paper introduces a new mathematical model called "absolute ranking" and a sampling-based computational method. These contributions come with practical implementation recommendations, aimed at providing a more robust framework for addressing the challenge of numerical scale variation in the assessment of performance across multiple algorithms and problems.