Abstract:We consider the problem of teaching via demonstrations in sequential decision-making settings. In particular, we study how to design a personalized curriculum over demonstrations to speed up the learner's convergence. We provide a unified curriculum strategy for two popular learner models: Maximum Causal Entropy Inverse Reinforcement Learning (MaxEnt-IRL) and Cross-Entropy Behavioral Cloning (CrossEnt-BC). Our unified strategy induces a ranking over demonstrations based on a notion of difficulty scores computed w.r.t. the teacher's optimal policy and the learner's current policy. Compared to the state of the art, our strategy doesn't require access to the learner's internal dynamics and still enjoys similar convergence guarantees under mild technical conditions. Furthermore, we adapt our curriculum strategy to teach a learner using domain knowledge in the form of task-specific difficulty scores when the teacher's optimal policy is unknown. Experiments on a car driving simulator environment and shortest path problems in a grid-world environment demonstrate the effectiveness of our proposed curriculum strategy.
Abstract:This work presents a novel approach for the early recognition of the type of a laparoscopic surgery from its video. Early recognition algorithms can be beneficial to the development of 'smart' OR systems that can provide automatic context-aware assistance, and also enable quick database indexing. The task is however ridden with challenges specific to videos belonging to the domain of laparoscopy, such as high visual similarity across surgeries and large variations in video durations. To capture the spatio-temporal dependencies in these videos, we choose as our model a combination of a Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) network. We then propose two complementary approaches for improving early recognition performance. The first approach is a CNN fine-tuning method that encourages surgeries to be distinguished based on the initial frames of laparoscopic videos. The second approach, referred to as 'Future-State Predicting LSTM', trains an LSTM to predict information related to future frames, which helps in distinguishing between the different types of surgeries. We evaluate our approaches on a large dataset of 425 laparoscopic videos containing 9 types of surgeries (Laparo425), and achieve on average an accuracy of 75% having observed only the first 10 minutes of a surgery. These results are quite promising from a practical standpoint and also encouraging for other types of image-guided surgeries.
Abstract:Real-time algorithms for automatically recognizing surgical phases are needed to develop systems that can provide assistance to surgeons, enable better management of operating room (OR) resources and consequently improve safety within the OR. State-of-the-art surgical phase recognition algorithms using laparoscopic videos are based on fully supervised training. This limits their potential for widespread application, since creation of manual annotations is an expensive process considering the numerous types of existing surgeries and the vast amount of laparoscopic videos available. In this work, we propose a new self-supervised pre-training approach based on the prediction of remaining surgery duration (RSD) from laparoscopic videos. The RSD prediction task is used to pre-train a convolutional neural network (CNN) and long short-term memory (LSTM) network in an end-to-end manner. Our proposed approach utilizes all available data and reduces the reliance on annotated data, thereby facilitating the scaling up of surgical phase recognition algorithms to different kinds of surgeries. Additionally, we present EndoN2N, an end-to-end trained CNN-LSTM model for surgical phase recognition and evaluate the performance of our approach on a dataset of 120 Cholecystectomy laparoscopic videos (Cholec120). This work also presents the first systematic study of self-supervised pre-training approaches to understand the amount of annotations required for surgical phase recognition. Interestingly, the proposed RSD pre-training approach leads to performance improvement even when all the training data is manually annotated and outperforms the single pre-training approach for surgical phase recognition presently published in the literature. It is also observed that end-to-end training of CNN-LSTM networks boosts surgical phase recognition performance.
Abstract:Objective: Accurate surgery duration estimation is necessary for optimal OR planning, which plays an important role for patient comfort and safety as well as resource optimization. It is however challenging to preoperatively predict surgery duration since it varies significantly depending on the patient condition, surgeon skills, and intraoperative situation. We present an approach for intraoperative estimation of remaining surgery duration, which is well suited for deployment in the OR. Methods: We propose a deep learning pipeline, named RSDNet, which automatically estimates the remaining surgery duration intraoperatively by using only visual information from laparoscopic videos. An interesting feature of RSDNet is that it does not depend on any manual annotation during training. Results: The experimental results show that the proposed network significantly outperforms the method that is frequently used in surgical facilities for estimating surgery duration. Further, the generalizability of the approach is demonstrated by testing the pipeline on two large datasets containing different types of surgeries, 120 cholecystectomy and 170 gastric bypass videos. Conclusion: Creation of manual annotations requires expert knowledge and is a time-consuming process, especially considering the numerous types of surgeries performed in a hospital and the large number of laparoscopic videos available. Since the proposed pipeline is not reliant on manual annotation, it is easily scalable to many types of surgeries. Significance: An improved OR management system could be developed with RSDNet as a result of its superior performance and ability to be efficiently scaled up to many kinds of surgeries.