There are many areas of scientific endeavour where large, complex datasets are needed for benchmarking. Evolutionary computing provides a means towards creating such sets. As a case study, we consider Conway's Surreal numbers. They have largely been treated as a theoretical construct, with little effort towards empirical study, at least in part because of the difficulty of working with all but the smallest numbers. To advance this status, we need efficient algorithms, and in order to develop such we need benchmark data sets of surreal numbers. In this paper, we present a method for generating ensembles of random surreal numbers to benchmark algorithms. The approach uses an evolutionary algorithm to create the benchmark datasets where we can analyse and control features of the resulting test sets. Ultimately, the process is designed to generate networks with defined properties, and we expect this to be useful for other types of network data.