Abstract:Observing and analyzing children's social behaviors is crucial for the early diagnosis of Autism Spectrum Disorders (ASD). This work focuses on automatically detecting ASD using computer vision techniques and large language models (LLMs). Existing methods typically rely on supervised learning. However, the scarcity of ASD diagnostic datasets and the lack of interpretability in diagnostic results significantly limits its clinical application. To address these challenges, we introduce a novel unsupervised approach based on script-centric behavior understanding. Our pipeline converts video content into scripts that describe the behavior of characters, leveraging the generalizability of large language models to detect ASD in a zero-shot or few-shot manner. Specifically, we propose a scripts transcription module for multimodal behavior data textualization and a domain prompts module to bridge LLMs. Our method achieves an accuracy of 92.00\% in diagnosing ASD in children with an average age of 24 months, surpassing the performance of supervised learning methods by 3.58\% absolutely. Extensive experiments confirm the effectiveness of our approach and suggest its potential for advancing ASD research through LLMs.
Abstract:We give an overview of AutoInspect, a ROS-based software system for robust and extensible mission-level autonomy. Over the past three years AutoInspect has been deployed in a variety of environments, including at a mine, a chemical plant, a mock oil rig, decommissioned nuclear power plants, and a fusion reactor for durations ranging from hours to weeks. The system combines robust mapping and localisation with graph-based autonomous navigation, mission execution, and scheduling to achieve a complete autonomous inspection system. The time from arrival at a new site to autonomous mission execution can be under an hour. It is deployed on a Boston Dynamics Spot robot using a custom sensing and compute payload called Frontier. In this work we go into detail of the system's performance in two long-term deployments of 49 days at a robotics test facility, and 35 days at the Joint European Torus (JET) fusion reactor in Oxfordshire, UK.
Abstract:When transferring a Deep Reinforcement Learning model from simulation to the real world, the performance could be unsatisfactory since the simulation cannot imitate the real world well in many circumstances. This results in a long period of fine-tuning in the real world. This paper proposes a self-supervised vision-based DRL method that allows robots to pick and place objects effectively and efficiently when directly transferring a training model from simulation to the real world. A height-sensitive action policy is specially designed for the proposed method to deal with crowded and stacked objects in challenging environments. The training model with the proposed approach can be applied directly to a real suction task without any fine-tuning from the real world while maintaining a high suction success rate. It is also validated that our model can be deployed to suction novel objects in a real experiment with a suction success rate of 90\% without any real-world fine-tuning. The experimental video is available at: https://youtu.be/jSTC-EGsoFA.
Abstract:Sim-and-real training is a promising alternative to sim-to-real training for robot manipulations. However, the current sim-and-real training is neither efficient, i.e., slow convergence to the optimal policy, nor effective, i.e., sizeable real-world robot data. Given limited time and hardware budgets, the performance of sim-and-real training is not satisfactory. In this paper, we propose a Consensus-based Sim-And-Real deep reinforcement learning algorithm (CSAR) for manipulator pick-and-place tasks, which shows comparable performance in both sim-and-real worlds. In this algorithm, we train the agents in simulators and the real world to get the optimal policies for both sim-and-real worlds. We found two interesting phenomenons: (1) Best policy in simulation is not the best for sim-and-real training. (2) The more simulation agents, the better sim-and-real training. The experimental video is available at: https://youtu.be/mcHJtNIsTEQ.