Abstract:Surgical data science (SDS) is a field that analyzes patient data before, during, and after surgery to improve surgical outcomes and skills. However, surgical data is scarce, heterogeneous, and complex, which limits the applicability of existing machine learning methods. In this work, we introduce the novel task of future video generation in laparoscopic surgery. This task can augment and enrich the existing surgical data and enable various applications, such as simulation, analysis, and robot-aided surgery. Ultimately, it involves not only understanding the current state of the operation but also accurately predicting the dynamic and often unpredictable nature of surgical procedures. Our proposed method, VISAGE (VIdeo Synthesis using Action Graphs for Surgery), leverages the power of action scene graphs to capture the sequential nature of laparoscopic procedures and utilizes diffusion models to synthesize temporally coherent video sequences. VISAGE predicts the future frames given only a single initial frame, and the action graph triplets. By incorporating domain-specific knowledge through the action graph, VISAGE ensures the generated videos adhere to the expected visual and motion patterns observed in real laparoscopic procedures. The results of our experiments demonstrate high-fidelity video generation for laparoscopy procedures, which enables various applications in SDS.
Abstract:Existing multi-label frameworks only exploit the information deduced from the bipartition of the labels into a positive and negative set. Therefore, they do not benefit from the ranking order between positive labels, which is the concept we introduce in this paper. We propose a novel multi-label ranking method: GaussianMLR, which aims to learn implicit class significance values that determine the positive label ranks instead of treating them as of equal importance, by following an approach that unifies ranking and classification tasks associated with multi-label ranking. Due to the scarcity of public datasets, we introduce eight synthetic datasets generated under varying importance factors to provide an enriched and controllable experimental environment for this study. On both real-world and synthetic datasets, we carry out extensive comparisons with relevant baselines and evaluate the performance on both of the two sub-tasks. We show that our method is able to accurately learn a representation of the incorporated positive rank order, which is not only consistent with the ground truth but also proportional to the underlying information. We strengthen our claims empirically by conducting comprehensive experimental studies. Code is available at https://github.com/MrGranddy/GaussianMLR.
Abstract:Multi-label ranking maps instances to a ranked set of predicted labels from multiple possible classes. The ranking approach for multi-label learning problems received attention for its success in multi-label classification, with one of the well-known approaches being pairwise label ranking. However, most existing methods assume that only partial information about the preference relation is known, which is inferred from the partition of labels into a positive and negative set, then treat labels with equal importance. In this paper, we focus on the unique challenge of ranking when the order of the true label set is provided. We propose a novel dedicated loss function to optimize models by incorporating penalties for incorrectly ranked pairs, and make use of the ranking information present in the input. Our method achieves the best reported performance measures on both synthetic and real world ranked datasets and shows improvements on overall ranking of labels. Our experimental results demonstrate that our approach is generalizable to a variety of multi-label classification and ranking tasks, while revealing a calibration towards a certain ranking ordering.