Abstract:Surgery video segmentation is an important topic in the surgical AI field. It allows the AI model to understand the spatial information of a surgical scene. Meanwhile, due to the lack of annotated surgical data, surgery segmentation models suffer from limited performance. With the emergence of SAM2 model, a large foundation model for video segmentation trained on natural videos, zero-shot surgical video segmentation became more realistic but meanwhile remains to be explored. In this paper, we systematically evaluate the performance of SAM2 model in zero-shot surgery video segmentation task. We conducted experiments under different configurations, including different prompting strategies, robustness, etc. Moreover, we conducted an empirical evaluation over the performance, including 9 datasets with 17 different types of surgeries.
Abstract:This paper presents a novel approach for nonlinear assimilation called score-based sequential Langevin sampling (SSLS) within a recursive Bayesian framework. SSLS decomposes the assimilation process into a sequence of prediction and update steps, utilizing dynamic models for prediction and observation data for updating via score-based Langevin Monte Carlo. An annealing strategy is incorporated to enhance convergence and facilitate multi-modal sampling. The convergence of SSLS in TV-distance is analyzed under certain conditions, providing insights into error behavior related to hyper-parameters. Numerical examples demonstrate its outstanding performance in high-dimensional and nonlinear scenarios, as well as in situations with sparse or partial measurements. Furthermore, SSLS effectively quantifies the uncertainty associated with the estimated states, highlighting its potential for error calibration.
Abstract:Surgical scene segmentation is a fundamental task for robotic-assisted laparoscopic surgery understanding. It often contains various anatomical structures and surgical instruments, where similar local textures and fine-grained structures make the segmentation a difficult task. Vision-specific transformer method is a promising way for surgical scene understanding. However, there are still two main challenges. Firstly, the absence of inner-patch information fusion leads to poor segmentation performance. Secondly, the specific characteristics of anatomy and instruments are not specifically modeled. To tackle the above challenges, we propose a novel Transformer-based framework with an Asymmetric Feature Enhancement module (TAFE), which enhances local information and then actively fuses the improved feature pyramid into the embeddings from transformer encoders by a multi-scale interaction attention strategy. The proposed method outperforms the SOTA methods in several different surgical segmentation tasks and additionally proves its ability of fine-grained structure recognition. Code is available at https://github.com/cyuan-sjtu/ViT-asym.
Abstract:With the recent study of deep learning in scientific computation, the Physics-Informed Neural Networks (PINNs) method has drawn widespread attention for solving Partial Differential Equations (PDEs). Compared to traditional methods, PINNs can efficiently handle high-dimensional problems, but the accuracy is relatively low, especially for highly irregular problems. Inspired by the idea of adaptive finite element methods and incremental learning, we propose GAS, a Gaussian mixture distribution-based adaptive sampling method for PINNs. During the training procedure, GAS uses the current residual information to generate a Gaussian mixture distribution for the sampling of additional points, which are then trained together with historical data to speed up the convergence of the loss and achieve higher accuracy. Several numerical simulations on 2D and 10D problems show that GAS is a promising method that achieves state-of-the-art accuracy among deep solvers, while being comparable with traditional numerical solvers.
Abstract:To share the patient\textquoteright s data in the blockchain network can help to learn the accurate deep learning model for the better prediction of COVID-19 patients. However, privacy (e.g., data leakage) and security (e.g., reliability or trust of data) concerns are the main challenging task for the health care centers. To solve this challenging task, this article designs a privacy-preserving framework based on federated learning and blockchain. In the first step, we train the local model by using the capsule network for the segmentation and classification of the COVID-19 images. The segmentation aims to extract nodules and classification to train the model. In the second step, we secure the local model through the homomorphic encryption scheme. The designed scheme encrypts and decrypts the gradients for federated learning. Moreover, for the decentralization of the model, we design a blockchain-based federated learning algorithm that can aggregate the gradients and update the local model. In this way, the proposed encryption scheme achieves the data provider privacy, and blockchain guarantees the reliability of the shared data. The experiment results demonstrate the performance of the proposed scheme.