Abstract:Precise grasp force regulation in tendon-driven surgical instruments is fundamentally limited by nonlinear coupling between motor dynamics, transmission compliance, friction, and distal mechanics. Existing solutions typically rely on distal force sensing or analytical compensation, increasing hardware complexity or degrading performance under dynamic motion. We present a sensorless control framework that combines physics-consistent modeling and hybrid reinforcement learning to achieve high-precision distal force regulation in a proximally actuated surgical end-effector. We develop a first-principles digital twin of the da Vinci Xi grasping mechanism that captures coupled electrical, transmission, and jaw dynamics within a unified differential-algebraic formulation. To safely learn control policies in this stiff and highly nonlinear system, we introduce a three-stage pipeline:(i)a receding-horizon CMA-ES oracle that generates dynamically feasible expert trajectories,(ii)fully offline policy learning via Implicit Q-Learning to ensure stable initialization without unsafe exploration, and (iii)online refinement using TD3 for adaptation to on-policy dynamics. The resulting policy directly maps proximal measurements to motor voltages and requires no distal sensing. In simulation, the controller maintains grasp force within 1% of the desired reference during multi-harmonic jaw motion. Hardware experiments demonstrate average force errors below 4% across diverse trajectories, validating sim-to-real transfer. The learned policy contains approximately 71k param and executes at kH rates, enabling real-time deployment. These results demonstrate that high-fidelity modeling combined with structured offline-online RL can recover precise distal force behavior without additional sensing, offering a scalable and mechanically compatible solution for surgical robotic manipulation.
Abstract:Recognizing surgical phases and steps from video is a fundamental problem in computer-assisted interventions. Recent approaches increasingly rely on large-scale pre-training on thousands of labeled surgical videos, followed by zero-shot transfer to specific procedures. While effective, this strategy incurs substantial computational and data collection costs. In this work, we question whether such heavy pre-training is truly necessary. We propose Text-Augmented Action Segmentation Optimal Transport (TASOT), an unsupervised method for surgical phase and step recognition that extends Action Segmentation Optimal Transport (ASOT) by incorporating textual information generated directly from the videos. TASOT formulates temporal action segmentation as a multimodal optimal transport problem, where the matching cost is defined as a weighted combination of visual and text-based costs. The visual term captures frame-level appearance similarity, while the text term provides complementary semantic cues, and both are jointly regularized through a temporally consistent unbalanced Gromov-Wasserstein formulation. This design enables effective alignment between video frames and surgical actions without surgical-specific pretraining or external web-scale supervision. We evaluate TASOT on multiple benchmark surgical datasets and observe consistent and substantial improvements over existing zero-shot methods, including StrasBypass70 (+23.7), BernBypass70 (+4.5), Cholec80 (+16.5), and AutoLaparo (+19.6). These results demonstrate that fine-grained surgical understanding can be achieved by exploiting information already present in standard visual and textual representations, without resorting to increasingly complex pre-training pipelines. The code will be available at https://github.com/omar8ahmed9/TASOT.
Abstract:Manual endoscopic submucosal dissection (ESD) is technically demanding, and existing single-segment robotic tools offer limited dexterity. These limitations motivate the development of more advanced solutions. To address this, DESectBot, a novel dual segment continuum robot with a decoupled structure and integrated surgical forceps, enabling 6 degrees of freedom (DoFs) tip dexterity for improved lesion targeting in ESD, was developed in this work. Deep learning controllers based on gated recurrent units (GRUs) for simultaneous tip position and orientation control, effectively handling the nonlinear coupling between continuum segments, were proposed. The GRU controller was benchmarked against Jacobian based inverse kinematics, model predictive control (MPC), a feedforward neural network (FNN), and a long short-term memory (LSTM) network. In nested-rectangle and Lissajous trajectory tracking tasks, the GRU achieved the lowest position/orientation RMSEs: 1.11 mm/ 4.62° and 0.81 mm/ 2.59°, respectively. For orientation control at a fixed position (four target poses), the GRU attained a mean RMSE of 0.14 mm and 0.72°, outperforming all alternatives. In a peg transfer task, the GRU achieved a 100% success rate (120 success/120 attempts) with an average transfer time of 11.8s, the STD significantly outperforms novice-controlled systems. Additionally, an ex vivo ESD demonstration grasping, elevating, and resecting tissue as the scalpel completed the cut confirmed that DESectBot provides sufficient stiffness to divide thick gastric mucosa and an operative workspace adequate for large lesions.These results confirm that GRU-based control significantly enhances precision, reliability, and usability in ESD surgical training scenarios.




Abstract:Continuum robots, inspired by octopus arms and elephant trunks, combine dexterity with intrinsic compliance, making them well suited for unstructured and confined environments. Yet their continuously deformable morphology poses challenges for motion planning and control, calling for accurate but lightweight models. We propose the Lightweight Actuation Space Energy Modeling (LASEM) framework for cable driven continuum robots, which formulates actuation potential energy directly in actuation space. LASEM yields an analytical forward model derived from geometrically nonlinear beam and rod theories via Hamilton's principle, while avoiding explicit modeling of cable backbone contact. It accepts both force and displacement inputs, thereby unifying kinematic and static formulations. Assuming the friction is neglected, the framework generalizes to nonuniform geometries, arbitrary cable routings, distributed loading and axial extensibility, while remaining computationally efficient for real-time use. Numerical simulations validate its accuracy, and a semi-analytical iterative scheme is developed for inverse kinematics. To address discretization in practical robots, LASEM further reformulates the functional minimization as a numerical optimization, which also naturally incorporates cable potential energy without explicit contact modeling.
Abstract:Soft robots, compared to rigid robots, possess inherent advantages, including higher degrees of freedom, compliance, and enhanced safety, which have contributed to their increasing application across various fields. Among these benefits, adaptability is particularly noteworthy. In this paper, adaptability in soft robots is categorized into external and internal adaptability. External adaptability refers to the robot's ability to adjust, either passively or actively, to variations in environments, object properties, geometries, and task dynamics. Internal adaptability refers to the robot's ability to cope with internal variations, such as manufacturing tolerances or material aging, and to generalize control strategies across different robots. As the field of soft robotics continues to evolve, the significance of adaptability has become increasingly pronounced. In this review, we summarize various approaches to enhancing the adaptability of soft robots, including design, sensing, and control strategies. Additionally, we assess the impact of adaptability on applications such as surgery, wearable devices, locomotion, and manipulation. We also discuss the limitations of soft robotics adaptability and prospective directions for future research. By analyzing adaptability through the lenses of implementation, application, and challenges, this paper aims to provide a comprehensive understanding of this essential characteristic in soft robotics and its implications for diverse applications.




Abstract:The inherent challenges of robotic underwater exploration, such as hydrodynamic effects, the complexity of dynamic coupling, and the necessity for sensitive interaction with marine life, call for the adoption of soft robotic approaches in marine exploration. To address this, we present a novel prototype, ZodiAq, a soft underwater drone inspired by prokaryotic bacterial flagella. ZodiAq's unique dodecahedral structure, equipped with 12 flagella-like arms, ensures design redundancy and compliance, ideal for navigating complex underwater terrains. The prototype features a central unit based on a Raspberry Pi, connected to a sensory system for inertial, depth, and vision detection, and an acoustic modem for communication. Combined with the implemented control law, it renders ZodiAq an intelligent system. This paper details the design and fabrication process of ZodiAq, highlighting design choices and prototype capabilities. Based on the strain-based modeling of Cosserat rods, we have developed a digital twin of the prototype within a simulation toolbox to ease analysis and control. To optimize its operation in dynamic aquatic conditions, a simplified model-based controller has been developed and implemented, facilitating intelligent and adaptive movement in the hydrodynamic environment. Extensive experimental demonstrations highlight the drone's potential, showcasing its design redundancy, embodied intelligence, crawling gait, and practical applications in diverse underwater settings. This research contributes significantly to the field of underwater soft robotics, offering a promising new avenue for safe, efficient, and environmentally conscious underwater exploration.




Abstract:This study presents acoustic-based methods for the control of multiple autonomous underwater vehicles (AUV). This study proposes two different models for implementing boundary and path control on low-cost AUVs using acoustic communication and a single central acoustic beacon. Two methods are presented: the Range Variation-Based (RVB) model completely relies on range data obtained by acoustic modems, whereas the Heading Estimation-Based (HEB) model uses ranges and range rates to estimate the position of the central boundary beacon and perform assigned behaviors. The models are tested on two boundary control behaviors: Fencing and Milling. Fencing behavior ensures AUVs return within predefined boundaries, while Milling enables the AUVs to move cyclically on a predefined path around the beacon. Models are validated by successfully performing the boundary control behaviors in simulations, pool tests, including artificial underwater currents, and field tests conducted in the ocean. All tests were performed with fully autonomous platforms, and no external input or sensor was provided to the AUVs during validation. Quantitative and qualitative analyses are presented in the study, focusing on the effect and application of a multi-robot system.




Abstract:Modular soft robot arms (MSRAs) are composed of multiple independent modules connected in a sequence. Due to their modular structure and high degrees of freedom (DOFs), these modules can simultaneously bend at different angles in various directions, enabling complex deformation. This capability allows MSRAs to perform more intricate tasks than single module robots. However, the modular structure also induces challenges in accurate planning, modeling, and control. Nonlinearity, hysteresis, and gravity complicate the physical model, while the modular structure and increased DOFs further lead to accumulative errors along the sequence. To address these challenges, we propose a flexible task space planning and control strategy for MSRAs, named S2C2A (State to Configuration to Action). Our approach formulates an optimization problem, S2C (State to Configuration planning), which integrates various loss functions and a forward MSRA model to generate configuration trajectories based on target MSRA states. Given the model complexity, we leverage a biLSTM network as the forward model. Subsequently, a configuration controller C2A (Configuration to Action control) is implemented to follow the planned configuration trajectories, leveraging only inaccurate internal sensing feedback. Both a biLSTM network and a physical model are utilized for configuration control. We validated our strategy using a cable-driven MSRA, demonstrating its ability to perform diverse offline tasks such as position control, orientation control, and obstacle avoidance. Furthermore, our strategy endows MSRA with online interaction capability with targets and obstacles. Future work will focus on addressing MSRA challenges, such as developing more accurate physical models and reducing configuration estimation errors along the module sequence.
Abstract:In this paper we show the application of the new robotic multi-platform system HSURF to a specific use case of teleoperation, aimed at monitoring and inspection. The HSURF system, consists of 3 different kinds of platforms: floater, sinker and robotic fishes. The collaborative control of the 3 platforms allows a remotely based operator to control the fish in order to visit and inspect several targets underwater following a complex trajectory. A shared autonomy solution shows to be the most suitable, in order to minimize the effect of limited bandwidth and relevant delay intrinsic to acoustic communications. The control architecture is described and preliminary results of the acoustically teleoperated visits of multiple targets in a testing pool are provided.
Abstract:The nonlinearity and hysteresis of soft robot motions have posed challenges in accurate soft robot control. Neural networks, especially recurrent neural networks (RNNs), have been widely leveraged for this issue due to their nonlinear activation functions and recurrent structures. Although they have shown satisfying accuracy in most tasks, these black-box approaches are not explainable, and hence, they are unsuitable for areas with high safety requirements, like robot-assisted surgery. Based on the RNN controllers, we propose a data-driven explainable controller (DDEC) whose parameters can be updated online. We discuss the Jacobian controller and kinematics controller in theory and demonstrate that they are only special cases of DDEC. Moreover, we utilize RNN, the Jacobian controller, the kinematics controller, and DDECs for trajectory following tasks. Experimental results have shown that our approach outperforms the other controllers considering trajectory following errors while being explainable. We also conduct a study to explore and explain the functions of each DDEC component. This is the first interpretable soft robot controller that overcomes the shortcomings of both NN controllers and interpretable controllers. Future work may involve proposing different DDECs based on different RNN controllers and exploiting them for high-safety-required applications.