Abstract:Modern distributed storage systems come with aplethora of configurable parameters that controlmodule behavior and affect system performance. Default settings provided by developers are often suboptimal for specific user cases. Tuning parameters can provide significant performance gains but is a difficult task requiring profound experience and expertise, due to the immense number of configurable parameters, complex inner dependencies and non-linearsystem behaviors. To overcome these difficulties, we propose an automatic simulation-based approach, Sapphire, to recommend optimal configurations by leveraging machine learning and black-box optimization techniques. We evaluate Sapphire on Ceph. Results show that Sapphire significantly boosts Ceph performance to 2.2x compared to the default configuration.