Similarity measures are fundamental tools for quantifying the alignment between artificial and biological systems. However, the diversity of similarity measures and their varied naming and implementation conventions makes it challenging to compare across studies. To facilitate comparisons and make explicit the implementation choices underlying a given code package, we have created and are continuing to develop a Python repository that benchmarks and standardizes similarity measures. The goal of creating a consistent naming convention that uniquely and efficiently specifies a similarity measure is not trivial as, for example, even commonly used methods like Centered Kernel Alignment (CKA) have at least 12 different variations, and this number will likely continue to grow as the field evolves. For this reason, we do not advocate for a fixed, definitive naming convention. The landscape of similarity measures and best practices will continue to change and so we see our current repository, which incorporates approximately 100 different similarity measures from 14 packages, as providing a useful tool at this snapshot in time. To accommodate the evolution of the field we present a framework for developing, validating, and refining naming conventions with the goal of uniquely and efficiently specifying similarity measures, ultimately making it easier for the community to make comparisons across studies.