Abstract:Active perception enables robots to dynamically gather information by adjusting their viewpoints, a crucial capability for interacting with complex, partially observable environments. In this paper, we present AP-VLM, a novel framework that combines active perception with a Vision-Language Model (VLM) to guide robotic exploration and answer semantic queries. Using a 3D virtual grid overlaid on the scene and orientation adjustments, AP-VLM allows a robotic manipulator to intelligently select optimal viewpoints and orientations to resolve challenging tasks, such as identifying objects in occluded or inclined positions. We evaluate our system on two robotic platforms: a 7-DOF Franka Panda and a 6-DOF UR5, across various scenes with differing object configurations. Our results demonstrate that AP-VLM significantly outperforms passive perception methods and baseline models, including Toward Grounded Common Sense Reasoning (TGCSR), particularly in scenarios where fixed camera views are inadequate. The adaptability of AP-VLM in real-world settings shows promise for enhancing robotic systems' understanding of complex environments, bridging the gap between high-level semantic reasoning and low-level control.
Abstract:Selective harvesting by autonomous robots will be a critical enabling technology for future farming. Increases in inflation and shortages of skilled labour are driving factors that can help encourage user acceptability of robotic harvesting. For example, robotic strawberry harvesting requires real-time high-precision fruit localisation, 3D mapping and path planning for 3-D cluster manipulation. Whilst industry and academia have developed multiple strawberry harvesting robots, none have yet achieved human-cost parity. Achieving this goal requires increased picking speed (perception, control and movement), accuracy and the development of low-cost robotic system designs. We propose the edge-server over 5G for Selective Harvesting (E5SH) system, which is an integration of high bandwidth and low latency Fifth Generation (5G) mobile network into a crop harvesting robotic platform, which we view as an enabler for future robotic harvesting systems. We also consider processing scale and speed in conjunction with system environmental and energy costs. A system architecture is presented and evaluated with support from quantitative results from a series of experiments that compare the performance of the system in response to different architecture choices, including image segmentation models, network infrastructure (5G vs WiFi) and messaging protocols such as Message Queuing Telemetry Transport (MQTT) and Transport Control Protocol Robot Operating System (TCPROS). Our results demonstrate that the E5SH system delivers step-change peak processing performance speedup of above 18-fold than a stand-alone embedded computing Nvidia Jetson Xavier NX (NJXN) system.
Abstract:Acoustic Soft Tactile (AST) skin is a novel sensing technology which derives tactile information from the modulation of acoustic waves travelling through the skin's embedded acoustic channels. A generalisable data-driven calibration model maps the acoustic modulations to the corresponding tactile information in the form of contact forces with their contact locations and contact geometries. AST skin technology has been highlighted for its easy customisation. As a case study, this paper discusses the possibility of using AST skin on a custom-built robotic end effector finger for strawberry handling. The paper delves into the design, prototyping, and calibration method to sensorise the end effector finger with AST skin. A real-time force-controlled gripping experiment is conducted with the sensorised finger to handle strawberries by their peduncle. The finger could successfully grip the strawberry peduncle by maintaining a preset force of 2 N with a maximum Mean Absolute Error (MAE) of 0.31 N over multiple peduncle diameters and strawberry weight classes. Moreover, this study sets confidence in the usability of AST skin in generating real-time tactile feedback for robot manipulation tasks.