Abstract:Human-robot collaboration (HRC) in structured assembly requires reliable state estimation and adaptive task planning under noisy perception and human interventions. To address these challenges, we introduce a design-grounded human-aware planning framework for human-robot collaborative structured assembly. The framework comprises two coupled modules. Module I, Perception-to-Symbolic State (PSS), employs vision-language models (VLMs) based agents to align RGB-D observations with design specifications and domain knowledge, synthesizing verifiable symbolic assembly states. It outputs validated installed and uninstalled component sets for online state tracking. Module II, Human-Aware Planning and Replanning (HPR), performs task-level multi-robot assignment and updates the plan only when the observed state deviates from the expected execution outcome. It applies a minimal-change replanning rule to selectively revise task assignments and preserve plan stability even under human interventions. We validate the framework on a 27-component timber-frame assembly. The PSS module achieves 97% state synthesis accuracy, and the HPR module maintains feasible task progression across diverse HRC scenarios. Results indicate that integrating VLM-based perception with knowledge-driven planning improves robustness of state estimation and task planning under dynamic conditions.




Abstract:In the realm of mental health support chatbots, it is vital to show empathy and encourage self-exploration to provide tailored solutions. However, current approaches tend to provide general insights or solutions without fully understanding the help-seeker's situation. Therefore, we propose PsyMix, a chatbot that integrates the analyses of the seeker's state from the perspective of a psychotherapy approach (Chain-of-Psychotherapies, CoP) before generating the response, and learns to incorporate the strength of various psychotherapies by fine-tuning on a mixture of CoPs. Through comprehensive evaluation, we found that PsyMix can outperform the ChatGPT baseline, and demonstrate a comparable level of empathy in its responses to that of human counselors.




Abstract:Large Vision Language Models (LVLMs), despite their recent success, are hardly comprehensively tested for their cognitive abilities. Inspired by the prevalent use of the "Cookie Theft" task in human cognition test, we propose a novel evaluation benchmark to evaluate high-level cognitive ability of LVLMs using images with rich semantics. It defines eight reasoning capabilities and consists of an image description task and a visual question answering task. Our evaluation on well-known LVLMs shows that there is still a large gap in cognitive ability between LVLMs and humans.
Abstract:We demonstrated multi-mobile robot navigation based on Visible Light Positioning(VLP) localization. From our experiment, the VLP can accurately locate robots' positions in navigation.