Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Oleg Kobzarev

GestOS: Advanced Hand Gesture Interpretation via Large Language Models to control Any Type of Robot

Sep 17, 2025

Artem Lykov, Oleg Kobzarev, Dzmitry Tsetserukou

Abstract:We present GestOS, a gesture-based operating system for high-level control of heterogeneous robot teams. Unlike prior systems that map gestures to fixed commands or single-agent actions, GestOS interprets hand gestures semantically and dynamically distributes tasks across multiple robots based on their capabilities, current state, and supported instruction sets. The system combines lightweight visual perception with large language model (LLM) reasoning: hand poses are converted into structured textual descriptions, which the LLM uses to infer intent and generate robot-specific commands. A robot selection module ensures that each gesture-triggered task is matched to the most suitable agent in real time. This architecture enables context-aware, adaptive control without requiring explicit user specification of targets or commands. By advancing gesture interaction from recognition to intelligent orchestration, GestOS supports scalable, flexible, and user-friendly collaboration with robotic systems in dynamic environments.

Via

Access Paper or Ask Questions

GestLLM: Advanced Hand Gesture Interpretation via Large Language Models for Human-Robot Interaction

Jan 14, 2025

Oleg Kobzarev, Artem Lykov, Dzmitry Tsetserukou

Figure 1 for GestLLM: Advanced Hand Gesture Interpretation via Large Language Models for Human-Robot Interaction

Figure 2 for GestLLM: Advanced Hand Gesture Interpretation via Large Language Models for Human-Robot Interaction

Figure 3 for GestLLM: Advanced Hand Gesture Interpretation via Large Language Models for Human-Robot Interaction

Figure 4 for GestLLM: Advanced Hand Gesture Interpretation via Large Language Models for Human-Robot Interaction

Abstract:This paper introduces GestLLM, an advanced system for human-robot interaction that enables intuitive robot control through hand gestures. Unlike conventional systems, which rely on a limited set of predefined gestures, GestLLM leverages large language models and feature extraction via MediaPipe to interpret a diverse range of gestures. This integration addresses key limitations in existing systems, such as restricted gesture flexibility and the inability to recognize complex or unconventional gestures commonly used in human communication. By combining state-of-the-art feature extraction and language model capabilities, GestLLM achieves performance comparable to leading vision-language models while supporting gestures underrepresented in traditional datasets. For example, this includes gestures from popular culture, such as the ``Vulcan salute" from Star Trek, without any additional pretraining, prompt engineering, etc. This flexibility enhances the naturalness and inclusivity of robot control, making interactions more intuitive and user-friendly. GestLLM provides a significant step forward in gesture-based interaction, enabling robots to understand and respond to a wide variety of hand gestures effectively. This paper outlines its design, implementation, and evaluation, demonstrating its potential applications in advanced human-robot collaboration, assistive robotics, and interactive entertainment.

Via

Access Paper or Ask Questions

CognitiveOS: Large Multimodal Model based System to Endow Any Type of Robot with Generative AI

Jan 29, 2024

Artem Lykov, Mikhail Konenkov, Koffivi Fidèle Gbagbe, Mikhail Litvinov, Robinroy Peter, Denis Davletshin, Aleksey Fedoseev, Oleg Kobzarev, Ali Alabbas, Oussama Alyounes(+2 more)

Figure 1 for CognitiveOS: Large Multimodal Model based System to Endow Any Type of Robot with Generative AI

Figure 2 for CognitiveOS: Large Multimodal Model based System to Endow Any Type of Robot with Generative AI

Figure 3 for CognitiveOS: Large Multimodal Model based System to Endow Any Type of Robot with Generative AI

Figure 4 for CognitiveOS: Large Multimodal Model based System to Endow Any Type of Robot with Generative AI

Abstract:This paper introduces CognitiveOS, a disruptive system based on multiple transformer-based models, endowing robots of various types with cognitive abilities not only for communication with humans but also for task resolution through physical interaction with the environment. The system operates smoothly on different robotic platforms without extra tuning. It autonomously makes decisions for task execution by analyzing the environment and using information from its long-term memory. The system underwent testing on various platforms, including quadruped robots and manipulator robots, showcasing its capability to formulate behavioral plans even for robots whose behavioral examples were absent in the training dataset. Experimental results revealed the system's high performance in advanced task comprehension and adaptability, emphasizing its potential for real-world applications. The chapters of this paper describe the key components of the system and the dataset structure. The dataset for fine-tuning step generation model is provided at the following link: link coming soon

* Paper submitted to CHI 2024

Via

Access Paper or Ask Questions