Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Kenneth Kwok

From Grounding to Manipulation: Case Studies of Foundation Model Integration in Embodied Robotic Systems

May 21, 2025

Xiuchao Sui, Daiying Tian, Qi Sun, Ruirui Chen, Dongkyu Choi, Kenneth Kwok, Soujanya Poria

Abstract:Foundation models (FMs) are increasingly used to bridge language and action in embodied agents, yet the operational characteristics of different FM integration strategies remain under-explored -- particularly for complex instruction following and versatile action generation in changing environments. This paper examines three paradigms for building robotic systems: end-to-end vision-language-action (VLA) models that implicitly integrate perception and planning, and modular pipelines incorporating either vision-language models (VLMs) or multimodal large language models (LLMs). We evaluate these paradigms through two focused case studies: a complex instruction grounding task assessing fine-grained instruction understanding and cross-modal disambiguation, and an object manipulation task targeting skill transfer via VLA finetuning. Our experiments in zero-shot and few-shot settings reveal trade-offs in generalization and data efficiency. By exploring performance limits, we distill design implications for developing language-driven physical agents and outline emerging challenges and opportunities for FM-powered robotics in real-world conditions.

* 17 pages, 13 figures

Via

Access Paper or Ask Questions

A Probabilistic-Logic based Commonsense Representation Framework for Modelling Inferences with Multiple Antecedents and Varying Likelihoods

Dec 15, 2022

Shantanu Jaiswal, Liu Yan, Dongkyu Choi, Kenneth Kwok

Figure 1 for A Probabilistic-Logic based Commonsense Representation Framework for Modelling Inferences with Multiple Antecedents and Varying Likelihoods

Figure 2 for A Probabilistic-Logic based Commonsense Representation Framework for Modelling Inferences with Multiple Antecedents and Varying Likelihoods

Figure 3 for A Probabilistic-Logic based Commonsense Representation Framework for Modelling Inferences with Multiple Antecedents and Varying Likelihoods

Figure 4 for A Probabilistic-Logic based Commonsense Representation Framework for Modelling Inferences with Multiple Antecedents and Varying Likelihoods

Abstract:Commonsense knowledge-graphs (CKGs) are important resources towards building machines that can 'reason' on text or environmental inputs and make inferences beyond perception. While current CKGs encode world knowledge for a large number of concepts and have been effectively utilized for incorporating commonsense in neural models, they primarily encode declarative or single-condition inferential knowledge and assume all conceptual beliefs to have the same likelihood. Further, these CKGs utilize a limited set of relations shared across concepts and lack a coherent knowledge organization structure resulting in redundancies as well as sparsity across the larger knowledge graph. Consequently, today's CKGs, while useful for a first level of reasoning, do not adequately capture deeper human-level commonsense inferences which can be more nuanced and influenced by multiple contextual or situational factors. Accordingly, in this work, we study how commonsense knowledge can be better represented by -- (i) utilizing a probabilistic logic representation scheme to model composite inferential knowledge and represent conceptual beliefs with varying likelihoods and (ii) incorporating a hierarchical conceptual ontology to identify salient concept-relevant relations and organize beliefs at different conceptual levels. Our resulting knowledge representation framework can encode a wider variety of world knowledge and represent beliefs flexibly using grounded concepts as well as free-text phrases. As a result, the framework can be utilized as both a traditional free-text knowledge graph and a grounded logic-based inference system more suitable for neuro-symbolic applications. We describe how we extend the PrimeNet knowledge base with our framework through crowd-sourcing and expert-annotation, and demonstrate its application for more interpretable passage-based semantic parsing and question answering.

Via

Access Paper or Ask Questions

Improving Object Permanence using Agent Actions and Reasoning

Oct 01, 2021

Ying Siu Liang, Chen Zhang, Dongkyu Choi, Kenneth Kwok

Figure 1 for Improving Object Permanence using Agent Actions and Reasoning

Figure 2 for Improving Object Permanence using Agent Actions and Reasoning

Figure 3 for Improving Object Permanence using Agent Actions and Reasoning

Figure 4 for Improving Object Permanence using Agent Actions and Reasoning

Abstract:Object permanence in psychology means knowing that objects still exist even if they are no longer visible. It is a crucial concept for robots to operate autonomously in uncontrolled environments. Existing approaches learn object permanence from low-level perception, but perform poorly on more complex scenarios, like when objects are contained and carried by others. Knowledge about manipulation actions performed on an object prior to its disappearance allows us to reason about its location, e.g., that the object has been placed in a carrier. In this paper we argue that object permanence can be improved when the robot uses knowledge about executed actions and describe an approach to infer hidden object states from agent actions. We show that considering agent actions not only improves rule-based reasoning models but also purely neural approaches, showing its general applicability. Then, we conduct quantitative experiments on a snitch localization task using a dataset of 1,371 synthesized videos, where we compare the performance of different object permanence models with and without action annotations. We demonstrate that models with action annotations can significantly increase performance of both neural and rule-based approaches. Finally, we evaluate the usability of our approach in real-world applications by conducting qualitative experiments with two Universal Robots (UR5 and UR16e) in both lab and industrial settings. The robots complete benchmark tasks for a gearbox assembly and demonstrate the object permanence capabilities with real sensor data in an industrial environment.

* 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2021)

Via

Access Paper or Ask Questions

Maintaining a Reliable World Model using Action-aware Perceptual Anchoring

Jul 07, 2021

Ying Siu Liang, Dongkyu Choi, Kenneth Kwok

Figure 1 for Maintaining a Reliable World Model using Action-aware Perceptual Anchoring

Figure 2 for Maintaining a Reliable World Model using Action-aware Perceptual Anchoring

Figure 3 for Maintaining a Reliable World Model using Action-aware Perceptual Anchoring

Figure 4 for Maintaining a Reliable World Model using Action-aware Perceptual Anchoring

Abstract:Reliable perception is essential for robots that interact with the world. But sensors alone are often insufficient to provide this capability, and they are prone to errors due to various conditions in the environment. Furthermore, there is a need for robots to maintain a model of its surroundings even when objects go out of view and are no longer visible. This requires anchoring perceptual information onto symbols that represent the objects in the environment. In this paper, we present a model for action-aware perceptual anchoring that enables robots to track objects in a persistent manner. Our rule-based approach considers inductive biases to perform high-level reasoning over the results from low-level object detection, and it improves the robot's perceptual capability for complex tasks. We evaluate our model against existing baseline models for object permanence and show that it outperforms these on a snitch localisation task using a dataset of 1,371 videos. We also integrate our action-aware perceptual anchoring in the context of a cognitive architecture and demonstrate its benefits in a realistic gearbox assembly task on a Universal Robot.

* 2021 International Conference on Robotics and Automation (ICRA 2021)
* 7 pages, 3 figures

Via

Access Paper or Ask Questions