Abstract:Data generated by audits of social media websites have formed the basis of our understanding of the biases presented in algorithmic content recommendation systems. As legislators around the world are beginning to consider regulating the algorithmic systems that drive online platforms, it is critical to ensure the correctness of these inferred biases. However, as we will show in this paper, doing so is a challenging task for a variety of reasons related to the complexity of configuration parameters associated with the audits that gather data from a specific platform. Focusing specifically on YouTube, we show that conducting audits to make inferences about YouTube's recommendation systems is more methodologically challenging than one might expect. There are many methodological decisions that need to be considered in order to obtain scientifically valid results, and each of these decisions incur costs. For example, should an auditor use (expensive to obtain) logged-in YouTube accounts while gathering recommendations from the algorithm to obtain more accurate inferences? We explore the impact of this and many other decisions and make some startling discoveries about the methodological choices that impact YouTube's recommendations. Taken all together, our research suggests auditing configuration compromises that YouTube auditors and researchers can use to reduce audit overhead, both economically and computationally, without sacrificing accuracy of their inferences. Similarly, we also identify several configuration parameters that have a significant impact on the accuracy of measured inferences and should be carefully considered.