Abstract:Healthcare systems worldwide face persistent challenges in efficiency, accessibility, and personalization. Powered by modern AI technologies such as multimodal large language models and world models, Embodied AI (EmAI) represents a transformative frontier, offering enhanced autonomy and the ability to interact with the physical world to address these challenges. As an interdisciplinary and rapidly evolving research domain, "EmAI in healthcare" spans diverse fields such as algorithms, robotics, and biomedicine. This complexity underscores the importance of timely reviews and analyses to track advancements, address challenges, and foster cross-disciplinary collaboration. In this paper, we provide a comprehensive overview of the "brain" of EmAI for healthcare, wherein we introduce foundational AI algorithms for perception, actuation, planning, and memory, and focus on presenting the healthcare applications spanning clinical interventions, daily care & companionship, infrastructure support, and biomedical research. Despite its promise, the development of EmAI for healthcare is hindered by critical challenges such as safety concerns, gaps between simulation platforms and real-world applications, the absence of standardized benchmarks, and uneven progress across interdisciplinary domains. We discuss the technical barriers and explore ethical considerations, offering a forward-looking perspective on the future of EmAI in healthcare. A hierarchical framework of intelligent levels for EmAI systems is also introduced to guide further development. By providing systematic insights, this work aims to inspire innovation and practical applications, paving the way for a new era of intelligent, patient-centered healthcare.
Abstract:Recent advances in large language models (LLM) have the potential to shed light on the debate regarding the extent to which knowledge representation requires the grounding of embodied experience. Despite learning from limited modalities (e.g., text for GPT-3.5, and text+image for GPT-4), LLMs have nevertheless demonstrated human-like behaviors in various psychology tasks, which may provide an alternative interpretation of the acquisition of conceptual knowledge. We compared lexical conceptual representations between humans and ChatGPT (GPT-3.5 and GPT-4) on subjective ratings of various lexical conceptual features or dimensions (e.g., emotional arousal, concreteness, haptic, etc.). The results show that both GPT-3.5 and GPT-4 were strongly correlated with humans in some abstract dimensions, such as emotion and salience. In dimensions related to sensory and motor domains, GPT-3.5 shows weaker correlations while GPT-4 has made significant progress compared to GPT-3.5. Still, GPT-4 struggles to fully capture motor aspects of conceptual knowledge such as actions with foot/leg, mouth/throat, and torso. Moreover, we found that GPT-4's progress can largely be associated with its training in the visual domain. Certain aspects of conceptual representation appear to exhibit a degree of independence from sensory capacities, but others seem to necessitate them. Our findings provide insights into the complexities of knowledge representation from diverse perspectives and highlights the potential influence of embodied experience in shaping language and cognition.