Abstract:Self-consistency-based approaches, which involve repeatedly sampling multiple outputs and selecting the most consistent one as the final response, prove to be remarkably effective in improving the factual accuracy of large language models. Nonetheless, existing methods usually have strict constraints on the task format, largely limiting their applicability. In this paper, we present Integrative Decoding (ID), to unlock the potential of self-consistency in open-ended generation tasks. ID operates by constructing a set of inputs, each prepended with a previously sampled response, and then processes them concurrently, with the next token being selected by aggregating of all their corresponding predictions at each decoding step. In essence, this simple approach implicitly incorporates self-consistency in the decoding objective. Extensive evaluation shows that ID consistently enhances factuality over a wide range of language models, with substantial improvements on the TruthfulQA (+11.2%), Biographies (+15.4%) and LongFact (+8.5%) benchmarks. The performance gains amplify progressively as the number of sampled responses increases, indicating the potential of ID to scale up with repeated sampling.
Abstract:Surgical automation has the capability to improve the consistency of patient outcomes and broaden access to advanced surgical care in underprivileged communities. Shared autonomy, where the robot automates routine subtasks while the surgeon retains partial teleoperative control, offers great potential to make an impact. In this paper we focus on one important skill within surgical shared autonomy: Automating robotic assistance to maximize visual exposure and apply tissue tension for dissection and cautery. Ensuring consistent exposure to visualize the surgical site is crucial for both efficiency and patient safety. However, achieving this is highly challenging due to the complexities of manipulating deformable volumetric tissues that are prevalent in surgery.To address these challenges we propose \methodname, a framework for autonomous surgical robotic assistance to \methodfullname. We integrate a differentiable physics model with perceptual feedback to achieve our two key objectives: 1) Maximizing tissue exposure and applying tension for a specified dissection site through visual-servoing conrol and 2) Selecting optimal control positions for a dissection target based on deformable Jacobian analysis. We quantitatively assess our method through repeated real robot experiments on a tissue phantom, and showcase its capabilities through dissection experiments using shared autonomy on real animal tissue.
Abstract:Chronic wounds, including diabetic ulcers, pressure ulcers, and ulcers secondary to venous hypertension, affects more than 6.5 million patients and a yearly cost of more than $25 billion in the United States alone. Chronic wound treatment is currently a manual process, and we envision a future where robotics and automation will aid in this treatment to reduce cost and improve patient care. In this work, we present the development of the first robotic system for wound dressing removal which is reported to be the worst aspect of living with chronic wounds. Our method leverages differentiable physics-based simulation to perform gradient-based Model Predictive Control (MPC) for optimized trajectory planning. By integrating fracture mechanics of adhesion, we are able to model the peeling effect inherent to dressing adhesion. The system is further guided by carefully designed objective functions that promote both efficient and safe control, reducing the risk of tissue damage. We validated the efficacy of our approach through a series of experiments conducted on both synthetic skin phantoms and real human subjects. Our results demonstrate the system's ability to achieve precise and safe dressing removal trajectories, offering a promising solution for automating this essential healthcare procedure.
Abstract:Current robotic manipulators require fast and efficient motion-planning algorithms to operate in cluttered environments. State-of-the-art sampling-based motion planners struggle to scale to high-dimensional configuration spaces and are inefficient in complex environments. This inefficiency arises because these planners utilize either uniform or hand-crafted sampling heuristics within the configuration space. To address these challenges, we present the Spatial-informed Motion Planning Network (SIMPNet). SIMPNet consists of a stochastic graph neural network (GNN)-based sampling heuristic for informed sampling within the configuration space. The sampling heuristic of SIMPNet encodes the workspace embedding into the configuration space through a cross-attention mechanism. It encodes the manipulator's kinematic structure into a graph, which is used to generate informed samples within the framework of sampling-based motion planning algorithms. We have evaluated the performance of SIMPNet using a UR5e robotic manipulator operating within simple and complex workspaces, comparing it against baseline state-of-the-art motion planners. The evaluation results show the effectiveness and advantages of the proposed planner compared to the baseline planners.
Abstract:This paper introduces a unique force control and adaptation algorithm for a lightweight and low-complexity five-fingered robotic hand, namely an Integrated-Finger Robotic Hand (IFRH). The force control and adaptation algorithm is intuitive to design, easy to implement, and improves the grasping functionality through feedforward adaptation automatically. Specifically, we have extended Youla-parameterization which is traditionally used in feedback controller design into a feedforward iterative learning control algorithm (ILC). The uniqueness of such an extension is that both the feedback and feedforward controllers are parameterized over one unified design parameter which can be easily customized based on the desired closed-loop performance. While Youla-parameterization and ILC have been explored in the past on various applications, our unique parameterization and computational methods make the design intuitive and easy to implement. This provides both robust and adaptive learning capabilities, and our application rivals the complexity of many robotic hand control systems. Extensive experimental tests have been conducted to validate the effectiveness of our method.
Abstract:Learning musical structures and composition patterns is necessary for both music generation and understanding, but current methods do not make uniform use of learned features to generate and comprehend music simultaneously. In this paper, we propose PianoBART, a pre-trained model that uses BART for both symbolic piano music generation and understanding. We devise a multi-level object selection strategy for different pre-training tasks of PianoBART, which can prevent information leakage or loss and enhance learning ability. The musical semantics captured in pre-training are fine-tuned for music generation and understanding tasks. Experiments demonstrate that PianoBART efficiently learns musical patterns and achieves outstanding performance in generating high-quality coherent pieces and comprehending music. Our code and supplementary material are available at https://github.com/RS2002/PianoBart.
Abstract:Large Language Models (LLMs) have shown superior performance in various applications and fields. To achieve better performance on specialized domains such as law and advertisement, LLMs are often continue pre-trained on in-domain data. However, existing approaches suffer from two major issues. First, in-domain data are scarce compared with general domain-agnostic data. Second, data used for continual pre-training are not task-aware, such that they may not be helpful to downstream applications. We propose TRAIT, a task-oriented in-domain data augmentation framework. Our framework is divided into two parts: in-domain data selection and task-oriented synthetic passage generation. The data selection strategy identifies and selects a large amount of in-domain data from general corpora, and thus significantly enriches domain knowledge in the continual pre-training data. The synthetic passages contain guidance on how to use domain knowledge to answer questions about downstream tasks. By training on such passages, the model aligns with the need of downstream applications. We adapt LLMs to two domains: advertisement and math. On average, TRAIT improves LLM performance by 8% in the advertisement domain and 7.5% in the math domain.
Abstract:Industrial manipulators have extensively collaborated with human operators to execute tasks, e.g., disassembly of end-of-use products, in intelligent remanufacturing. A safety task execution requires real-time path planning for the manipulator's end-effector to autonomously avoid human operators. This is even more challenging when the end-effector needs to follow a planned path while avoiding the collision between the manipulator body and human operators, which is usually computationally expensive and limits real-time application. This paper proposes an efficient hybrid motion planning algorithm that consists of an A$^*$ algorithm and an online manipulator reconfiguration mechanism (OMRM) to tackle such challenges in task and configuration spaces respectively. The A$^*$ algorithm is first leveraged to plan the shortest collision-free path of the end-effector in task space. When the manipulator body is risky to the human operator, our OMRM then selects an alternative joint configuration with minimum reconfiguration effort from a database to assist the manipulator to follow the planned path and avoid the human operator simultaneously. The database of manipulator reconfiguration establishes the relationship between the task and configuration space offline using forward kinematics, and is able to provide multiple reconfiguration candidates for a desired end-effector's position. The proposed new hybrid algorithm plans safe manipulator motion during the whole task execution. Extensive numerical and experimental studies, as well as comparison studies between the proposed one and the state-of-the-art ones, have been conducted to validate the proposed motion planning algorithm.
Abstract:Surgical automation can improve the accessibility and consistency of life saving procedures. Most surgeries require separating layers of tissue to access the surgical site, and suturing to reattach incisions. These tasks involve deformable manipulation to safely identify and alter tissue attachment (boundary) topology. Due to poor visual acuity and frequent occlusions, surgeons tend to carefully manipulate the tissue in ways that enable inference of the tissue's attachment points without causing unsafe tearing. In a similar fashion, we propose JIGGLE, a framework for estimation and interactive sensing of unknown boundary parameters in deformable surgical environments. This framework has two key components: (1) a probabilistic estimation to identify the current attachment points, achieved by integrating a differentiable soft-body simulator with an extended Kalman filter (EKF), and (2) an optimization-based active control pipeline that generates actions to maximize information gain of the tissue attachments, while simultaneously minimizing safety costs. The robustness of our estimation approach is demonstrated through experiments with real animal tissue, where we infer sutured attachment points using stereo endoscope observations. We also demonstrate the capabilities of our method in handling complex topological changes such as cutting and suturing.
Abstract:There has been a growing utilization of industrial robots as complementary collaborators for human workers in re-manufacturing sites. Such a human-robot collaboration (HRC) aims to assist human workers in improving the flexibility and efficiency of labor-intensive tasks. In this paper, we propose a human-aware motion planning framework for HRC to effectively compute collision-free motions for manipulators when conducting collaborative tasks with humans. We employ a neural human motion prediction model to enable proactive planning for manipulators. Particularly, rather than blindly trusting and utilizing predicted human trajectories in the manipulator planning, we quantify uncertainties of the neural prediction model to further ensure human safety. Moreover, we integrate the uncertainty-aware prediction into a graph that captures key workspace elements and illustrates their interconnections. Then a graph neural network is leveraged to operate on the constructed graph. Consequently, robot motion planning considers both the dependencies among all the elements in the workspace and the potential influence of future movements of human workers. We experimentally validate the proposed planning framework using a 6-degree-of-freedom manipulator in a shared workspace where a human is performing disassembling tasks. The results demonstrate the benefits of our approach in terms of improving the smoothness and safety of HRC. A brief video introduction of this work is available as the supplemental materials.