Building energy performance benchmarking has been adopted widely in the USA and Canada through the Energy Star Portfolio Manager platform. Building operations and energy management professionals have long used a simple 1-100 score to understand how their building compares to its peers. This single number is easy to use, but is created by inaccurate linear regression (MLR) models. This paper proposes a methodology that enhances the existing Energy Star calculation method by increasing accuracy and providing additional model output processing to help explain why a building is achieving a certain score. We propose and test two new prediction models: multiple linear regression with feature interactions (MLRi) and gradient boosted trees (GBT). Both models have better average accuracy than the baseline Energy Star models. The third order MLRi and GBT models achieve 4.9% and 24.9% increase in adjusted R2, respectively, and 7.0% and 13.7% decrease in normalized root mean squared error (NRMSE), respectively, on average than MLR models for six building types. Even more importantly, a set of techniques is developed to help determine which factors most influence the score using SHAP values. The SHAP force visualization in particular offers an accessible overview of the aspects of the building that influence the score that non-technical users can readily interpret. This methodology is tested on the 2012 Commercial Building Energy Consumption Survey (CBECS)(1,812 buildings) and public data sets from the energy disclosure programs of New York City (11,131 buildings) and Seattle (2,073 buildings).