Abstract:We tackle safe trajectory planning under Gaussian mixture model (GMM) uncertainty. Specifically, we use a GMM to model the multimodal behaviors of obstacles' uncertain states. Then, we develop a mixed-integer conic approximation to the chance-constrained trajectory planning problem with deterministic linear systems and polyhedral obstacles. When the GMM moments are estimated via finite samples, we develop a tight concentration bound to ensure the chance constraint with a desired confidence. Moreover, to limit the amount of constraint violation, we develop a Conditional Value-at-Risk (CVaR) approach corresponding to the chance constraints and derive a tractable approximation for known and estimated GMM moments. We verify our methods with state-of-the-art trajectory prediction algorithms and autonomous driving datasets.
Abstract:We address safe multi-robot interaction under uncertainty. In particular, we formulate a chance-constrained linear quadratic Gaussian game with coupling constraints and system uncertainties. We find a tractable reformulation of the game and propose a dual ascent algorithm. We prove that the algorithm converges to a generalized Nash equilibrium of the reformulated game, ensuring the satisfaction of the chance constraints. We test our method in driving simulations and real-world robot experiments. Our method ensures safety under uncertainty and generates less conservative trajectories than single-agent model predictive control.
Abstract:The Medical Segment Anything Model (MedSAM) has shown remarkable performance in medical image segmentation, drawing significant attention in the field. However, its sensitivity to varying prompt types and locations poses challenges. This paper addresses these challenges by focusing on the development of reliable prompts that enhance MedSAM's accuracy. We introduce MedSAM-U, an uncertainty-guided framework designed to automatically refine multi-prompt inputs for more reliable and precise medical image segmentation. Specifically, we first train a Multi-Prompt Adapter integrated with MedSAM, creating MPA-MedSAM, to adapt to diverse multi-prompt inputs. We then employ uncertainty-guided multi-prompt to effectively estimate the uncertainties associated with the prompts and their initial segmentation results. In particular, a novel uncertainty-guided prompts adaptation technique is then applied automatically to derive reliable prompts and their corresponding segmentation outcomes. We validate MedSAM-U using datasets from multiple modalities to train a universal image segmentation model. Compared to MedSAM, experimental results on five distinct modal datasets demonstrate that the proposed MedSAM-U achieves an average performance improvement of 1.7\% to 20.5\% across uncertainty-guided prompts.
Abstract:Medical Report Grounding is pivotal in identifying the most relevant regions in medical images based on a given phrase query, a critical aspect in medical image analysis and radiological diagnosis. However, prevailing visual grounding approaches necessitate the manual extraction of key phrases from medical reports, imposing substantial burdens on both system efficiency and physicians. In this paper, we introduce a novel framework, Medical Report Grounding (MedRG), an end-to-end solution for utilizing a multi-modal Large Language Model to predict key phrase by incorporating a unique token, BOX, into the vocabulary to serve as an embedding for unlocking detection capabilities. Subsequently, the vision encoder-decoder jointly decodes the hidden embedding and the input medical image, generating the corresponding grounding box. The experimental results validate the effectiveness of MedRG, surpassing the performance of the existing state-of-the-art medical phrase grounding methods. This study represents a pioneering exploration of the medical report grounding task, marking the first-ever endeavor in this domain.
Abstract:Interpretability is a key concern in estimating heterogeneous treatment effects using machine learning methods, especially for healthcare applications where high-stake decisions are often made. Inspired by the Predictive, Descriptive, Relevant framework of interpretability, we propose causal rule learning which finds a refined set of causal rules characterizing potential subgroups to estimate and enhance our understanding of heterogeneous treatment effects. Causal rule learning involves three phases: rule discovery, rule selection, and rule analysis. In the rule discovery phase, we utilize a causal forest to generate a pool of causal rules with corresponding subgroup average treatment effects. The selection phase then employs a D-learning method to select a subset of these rules to deconstruct individual-level treatment effects as a linear combination of the subgroup-level effects. This helps to answer an ignored question by previous literature: what if an individual simultaneously belongs to multiple groups with different average treatment effects? The rule analysis phase outlines a detailed procedure to further analyze each rule in the subset from multiple perspectives, revealing the most promising rules for further validation. The rules themselves, their corresponding subgroup treatment effects, and their weights in the linear combination give us more insights into heterogeneous treatment effects. Simulation and real-world data analysis demonstrate the superior performance of causal rule learning on the interpretable estimation of heterogeneous treatment effect when the ground truth is complex and the sample size is sufficient.
Abstract:To better care for the elderly and disabled, it is essential for service robots to have an effective fusion method of object detection and grasp estimation. However, limited research has been observed on the combination of object detection and grasp estimation. To overcome this technical difficulty, a novel integrated method of detection-grasping for specific object based on the box coordinate matching is proposed in this paper. Firstly, the SOLOv2 instance segmentation model is improved by adding channel attention module (CAM) and spatial attention module (SAM). Then, the atrous spatial pyramid pooling (ASPP) and CAM are added to the generative residual convolutional neural network (GR-CNN) model to optimize grasp estimation. Furthermore, a detection-grasping integrated algorithm based on box coordinate matching (DG-BCM) is proposed to obtain the fusion model of object detection and grasp estimation. For verification, experiments on object detection and grasp estimation are conducted separately to verify the superiority of improved models. Additionally, grasping tasks for several specific objects are implemented on a simulation platform, demonstrating the feasibility and effectiveness of DG-BCM algorithm proposed in this paper.
Abstract:Recently, Segmenting Anything has taken an important step towards general artificial intelligence. At the same time, its reliability and fairness have also attracted great attention, especially in the field of health care. In this study, we propose multi-box prompts triggered uncertainty estimation for SAM cues to demonstrate the reliability of segmented lesions or tissues. We estimate the distribution of SAM predictions via Monte Carlo with prior distribution parameters, which employs different prompts as formulation of test-time augmentation. Our experimental results found that multi-box prompts augmentation improve the SAM performance, and endowed each pixel with uncertainty. This provides the first paradigm for a reliable SAM.
Abstract:Classification and segmentation are crucial in medical image analysis as they enable accurate diagnosis and disease monitoring. However, current methods often prioritize the mutual learning features and shared model parameters, while neglecting the reliability of features and performances. In this paper, we propose a novel Uncertainty-informed Mutual Learning (UML) framework for reliable and interpretable medical image analysis. Our UML introduces reliability to joint classification and segmentation tasks, leveraging mutual learning with uncertainty to improve performance. To achieve this, we first use evidential deep learning to provide image-level and pixel-wise confidences. Then, an Uncertainty Navigator Decoder is constructed for better using mutual features and generating segmentation results. Besides, an Uncertainty Instructor is proposed to screen reliable masks for classification. Overall, UML could produce confidence estimation in features and performance for each link (classification and segmentation). The experiments on the public datasets demonstrate that our UML outperforms existing methods in terms of both accuracy and robustness. Our UML has the potential to explore the development of more reliable and explainable medical image analysis models. We will release the codes for reproduction after acceptance.
Abstract:The problem of multi-object tracking is a fundamental computer vision research focus, widely used in public safety, transport, autonomous vehicles, robotics, and other regions involving artificial intelligence. Because of the complexity of natural scenes, object occlusion and semi-occlusion usually occur in fundamental tracking tasks. These can easily lead to ID switching, object loss, detect errors, and misaligned limitation boxes. These conditions have a significant impact on the precision of multi-object tracking. In this paper, we design a new multi-object tracker for the above issues that contains an object \textbf{Relative Location Mapping} (RLM) model and \textbf{Target Region Density} (TRD) model. The new tracker is more sensitive to the differences in position relationships between objects. It can introduce low-score detection frames into different regions in real-time according to the density of object regions in the video. This improves the accuracy of object tracking without consuming extensive arithmetic resources. Our study shows that the proposed model has considerably enhanced the HOTA and DF1 measurements on the MOT17 and MOT20 data sets when applied to the advanced MOT method.
Abstract:This paper proposes passive WiFi indoor localization. Instead of using WiFi signals received by mobile devices as fingerprints, we use signals received by routers to locate the mobile carrier. Consequently, software installation on the mobile device is not required. To resolve the data insufficiency problem, flow control signals such as request to send (RTS) and clear to send (CTS) are utilized. In our model, received signal strength indicator (RSSI) and channel state information (CSI) are used as fingerprints for several algorithms, including deterministic, probabilistic and neural networks localization algorithms. We further investigated localization algorithms performance through extensive on-site experiments with various models of phones at hundreds of testing locations. We demonstrate that our passive scheme achieves an average localization error of 0.8 m when the phone is actively transmitting data frames and 1.5 m when it is not transmitting data frames.