Abstract:Purpose: The development of machine learning models for surgical workflow and instrument recognition from temporal data represents a challenging task due to the complex nature of surgical workflows. In particular, the imbalanced distribution of data is one of the major challenges in the domain of surgical workflow recognition. In order to obtain meaningful results, careful partitioning of data into training, validation, and test sets, as well as the selection of suitable evaluation metrics are crucial. Methods: In this work, we present an openly available web-based application that enables interactive exploration of dataset partitions. The proposed visual framework facilitates the assessment of dataset splits for surgical workflow recognition, especially with regard to identifying sub-optimal dataset splits. Currently, it supports visualization of surgical phase and instrument annotations. Results: In order to validate the dedicated interactive visualizations, we use a dataset split of the Cholec80 dataset. This dataset split was specifically selected to reflect a case of strong data imbalance. Using our software, we were able to identify phases, phase transitions, and combinations of surgical instruments that were not represented in one of the sets. Conclusion: In order to obtain meaningful results in highly unbalanced class distributions, special care should be taken with respect to the selection of an appropriate split. Interactive data visualization represents a promising approach for the assessment of machine learning datasets. The source code is available at https://github.com/Cardio-AI/endovis-ml