Abstract:Summarizing patient clinical notes is vital for reducing documentation burdens. Current manual summarization makes medical staff struggle. We propose an automatic method using LLMs, but long inputs cause LLMs to lose context, reducing output quality especially in small size model. We used a 7B model, open-calm-7b, enhanced with Native Bayes Context Extend and a redesigned decoding mechanism to reference one sentence at a time, keeping inputs within context windows, 2048 tokens. Our improved model achieved near parity with Google's over 175B Gemini on ROUGE-L metrics with 200 samples, indicating strong performance using less resources, enhancing automated EMR summarization feasibility.
Abstract:The medical image processing field often encounters the critical issue of scarce annotated data. Transfer learning has emerged as a solution, yet how to select an adequate source task and effectively transfer the knowledge to the target task remains challenging. To address this, we propose a novel sequential transfer scheme with a task affinity metric tailored for medical images. Considering the characteristics of medical image segmentation tasks, we analyze the image and label similarity between tasks and compute the task affinity scores, which assess the relatedness among tasks. Based on this, we select appropriate source tasks and develop an effective sequential transfer strategy by incorporating intermediate source tasks to gradually narrow the domain discrepancy and minimize the transfer cost. Thereby we identify the best sequential transfer path for the given target task. Extensive experiments on three MRI medical datasets, FeTS 2022, iSeg-2019, and WMH, demonstrate the efficacy of our method in finding the best source sequence. Compared with directly transferring from a single source task, the sequential transfer results underline a significant improvement in target task performance, achieving an average of 2.58% gain in terms of segmentation Dice score, notably, 6.00% for FeTS 2022. Code is available at the git repository.
Abstract:Group re-identification (re-ID) aims to match groups with the same people under different cameras, mainly involves the challenges of group members and layout changes well. Most existing methods usually use the k-nearest neighbor algorithm to update node features to consider changes in group membership, but these methods cannot solve the problem of group layout changes. To this end, we propose a novel vision transformer based random walk framework for group re-ID. Specifically, we design a vision transformer based on a monocular depth estimation algorithm to construct a graph through the average depth value of pedestrian features to fully consider the impact of camera distance on group members relationships. In addition, we propose a random walk module to reconstruct the graph by calculating affinity scores between target and gallery images to remove pedestrians who do not belong to the current group. Experimental results show that our framework is superior to most methods.
Abstract:Point clouds are crucial for capturing three-dimensional data but often suffer from incompleteness due to limitations such as resolution and occlusion. Traditional methods typically rely on point-based approaches within discriminative frameworks for point cloud completion. In this paper, we introduce \textbf{Diffusion-Occ}, a novel framework for Diffusion Point Cloud Completion. Diffusion-Occ utilizes a two-stage coarse-to-fine approach. In the first stage, the Coarse Density Voxel Prediction Network (CDNet) processes partial points to predict coarse density voxels, streamlining global feature extraction through voxel classification, as opposed to previous regression-based methods. In the second stage, we introduce the Occupancy Generation Network (OccGen), a conditional occupancy diffusion model based on a transformer architecture and enhanced by our Point-Voxel Fuse (PVF) block. This block integrates coarse density voxels with partial points to leverage both global and local features for comprehensive completion. By thresholding the occupancy field, we convert it into a complete point cloud. Additionally, our method employs diverse training mixtures and efficient diffusion parameterization to enable effective one-step sampling during both training and inference. Experimental results demonstrate that Diffusion-Occ outperforms existing discriminative and generative methods.
Abstract:As the rapid development of computer vision and the emergence of powerful network backbones and architectures, the application of deep learning in medical imaging has become increasingly significant. Unlike natural images, medical images lack huge volumes of data but feature more modalities, making it difficult to train a general model that has satisfactory performance across various datasets. In practice, practitioners often suffer from manually creating and testing models combining independent backbones and architectures, which is a laborious and time-consuming process. We propose Flemme, a FLExible and Modular learning platform for MEdical images. Our platform separates encoders from the model architectures so that different models can be constructed via various combinations of supported encoders and architectures. We construct encoders using building blocks based on convolution, transformer, and state-space model (SSM) to process both 2D and 3D image patches. A base architecture is implemented following an encoder-decoder style, with several derived architectures for image segmentation, reconstruction, and generation tasks. In addition, we propose a general hierarchical architecture incorporating a pyramid loss to optimize and fuse vertical features. Experiments demonstrate that this simple design leads to an average improvement of 5.60% in Dice score and 7.81% in mean interaction of units (mIoU) for segmentation models, as well as an enhancement of 5.57% in peak signal-to-noise ratio (PSNR) and 8.22% in structural similarity (SSIM) for reconstruction models. We further utilize Flemme as an analytical tool to assess the effectiveness and efficiency of various encoders across different tasks. Code is available at https://github.com/wlsdzyzl/flemme.
Abstract:Budget allocation of marketplace levers, such as incentives for drivers and promotions for riders, has long been a technical and business challenge at Uber; understanding lever budget changes' impact and estimating cost efficiency to achieve predefined budgets is crucial, with the goal of optimal allocations that maximize business value; we introduce an end-to-end machine learning and optimization procedure to automate budget decision-making for cities, relying on feature store, model training and serving, optimizers, and backtesting; proposing state-of-the-art deep learning (DL) estimator based on S-Learner and a novel tensor B-Spline regression model, we solve high-dimensional optimization with ADMM and primal-dual interior point convex optimization, substantially improving Uber's resource allocation efficiency.
Abstract:Sampling is widely used in various point cloud tasks as it can effectively reduce resource consumption. Recently, some methods have proposed utilizing neural networks to optimize the sampling process for various task requirements. Currently, deep downsampling methods can be categorized into two main types: generative-based and score-based. Generative-based methods directly generate sampled point clouds using networks, whereas score-based methods assess the importance of points according to specific rules and then select sampled point clouds based on their scores. However, these methods often result in noticeable clustering effects in high-intensity feature areas, compromising their ability to preserve small-scale features and leading to the loss of some structures, thereby affecting the performance of subsequent tasks. In this paper, we propose REPS, a reconstruction-based scoring strategy that evaluates the importance of each vertex by removing and reconstructing them using surrounding vertices. Our reconstruction process comprises point reconstruction and shape reconstruction. The two aforementioned reconstruction methods effectively evaluate the importance of vertices by removing them at different scales for reconstruction. These reconstructions ensure that our method maintains the overall geometric features of the point cloud and avoids disturbing small-scale structures during sampling. Additionally, we propose the Global-Local Fusion Attention (GLFA) module, which aggregates local and global attention features of point clouds, ensuring high-quality reconstruction and sampling effects. Our method outperforms previous approaches in preserving the structural features of the sampled point clouds. Furthermore, abundant experimental results demonstrate the superior performance of our method across various common tasks.
Abstract:We introduce a novel approach for the reconstruction of tubular shapes from skeletal representations. Our method processes all skeletal points as a whole, eliminating the need for splitting input structure into multiple segments. We represent the tubular shape as a truncated signed distance function (TSDF) in a voxel hashing manner, in which the signed distance between a voxel center and the object is computed through a simple geometric algorithm. Our method does not involve any surface sampling scheme or solving large matrix equations, and therefore is a faster and more elegant solution for tubular shape reconstruction compared to other approaches. Experiments demonstrate the efficiency and effectiveness of the proposed method. Code is avaliable at https://github.com/wlsdzyzl/Dragon.
Abstract:Most aerial manipulators use serial rigid-link designs, which results in large forces when initiating contacts during manipulation and could cause flight stability difficulty. This limitation could potentially be improved by the compliance of continuum manipulators. To achieve this goal, we present the novel design of a compact, lightweight, and modular cable-driven continuum manipulator for aerial drones. We then derive a complete modeling framework for its kinematics, statics, and stiffness (compliance). The modeling framework can guide the control and design problems to integrate the manipulator to aerial drones. In addition, thanks to the derived stiffness (compliance) matrix, and using a low-cost IMU sensor to capture deformation angles, we present a simple method to estimate manipulation force at the tip of the manipulator. We report preliminary experimental validations of the hardware prototype, providing insights on its manipulation feasibility. We also report preliminary results of the IMU-based force estimation method.
Abstract:The equilibrium shape of a continuum robot is resulted from both its internal actuation and the external physical interaction with a surrounding environment. A fast and accurate shape estimation method (i) can be used as a feedback to compensate for more accurate motion; and (ii) can reveal rich information about physical interactions (e.g. instrument-anatomy contacts / forces during a surgery). From a prior work that demonstrated an offline calibration of continuum robots, we adopt its shape modal representation and error propagation models that include identification Jacobians. In this work, we present an iterative observer approach to enable online shape estimation. We develop a dual Extended Kalman Filter (EKF) to estimate both the robot state and the shape modal parameters. The dual EKF provides robust estimation on (i) the configuration space variables that are controllable and driven by internal actuation; and (ii) the modal coefficients representing homotopies of shape families that are governed by the physical interactions with the environment. We report results from simulation studies in this work, and plan to investigate methods in the future to use the proposed approach for predicting physical interactions.