Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Miha Malenšek

Generating Diverse Synthetic Datasets for Evaluation of Real-life Recommender Systems

Nov 27, 2024

Miha Malenšek, Blaž Škrlj, Blaž Mramor, Jure Demšar

Figure 1 for Generating Diverse Synthetic Datasets for Evaluation of Real-life Recommender Systems

Figure 2 for Generating Diverse Synthetic Datasets for Evaluation of Real-life Recommender Systems

Figure 3 for Generating Diverse Synthetic Datasets for Evaluation of Real-life Recommender Systems

Figure 4 for Generating Diverse Synthetic Datasets for Evaluation of Real-life Recommender Systems

Abstract:Synthetic datasets are important for evaluating and testing machine learning models. When evaluating real-life recommender systems, high-dimensional categorical (and sparse) datasets are often considered. Unfortunately, there are not many solutions that would allow generation of artificial datasets with such characteristics. For that purpose, we developed a novel framework for generating synthetic datasets that are diverse and statistically coherent. Our framework allows for creation of datasets with controlled attributes, enabling iterative modifications to fit specific experimental needs, such as introducing complex feature interactions, feature cardinality, or specific distributions. We demonstrate the framework's utility through use cases such as benchmarking probabilistic counting algorithms, detecting algorithmic bias, and simulating AutoML searches. Unlike existing methods that either focus narrowly on specific dataset structures, or prioritize (private) data synthesis through real data, our approach provides a modular means to quickly generating completely synthetic datasets we can tailor to diverse experimental requirements. Our results show that the framework effectively isolates model behavior in unique situations and highlights its potential for significant advancements in the evaluation and development of recommender systems. The readily-available framework is available as a free open Python package to facilitate research with minimal friction.

* RecSys 2024'

Via

Access Paper or Ask Questions