Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:GestureGPT: Zero-shot Interactive Gesture Understanding and Grounding with Large Language Model Agents

Oct 30, 2023

Xin Zeng, Xiaoyu Wang, Tengxiang Zhang, Chun Yu, Shengdong Zhao, Yiqiang Chen

Figure 1 for GestureGPT: Zero-shot Interactive Gesture Understanding and Grounding with Large Language Model Agents

Figure 2 for GestureGPT: Zero-shot Interactive Gesture Understanding and Grounding with Large Language Model Agents

Figure 3 for GestureGPT: Zero-shot Interactive Gesture Understanding and Grounding with Large Language Model Agents

Figure 4 for GestureGPT: Zero-shot Interactive Gesture Understanding and Grounding with Large Language Model Agents

Share this with someone who'll enjoy it:

Abstract:Current gesture recognition systems primarily focus on identifying gestures within a predefined set, leaving a gap in connecting these gestures to interactive GUI elements or system functions (e.g., linking a 'thumb-up' gesture to a 'like' button). We introduce GestureGPT, a novel zero-shot gesture understanding and grounding framework leveraging large language models (LLMs). Gesture descriptions are formulated based on hand landmark coordinates from gesture videos and fed into our dual-agent dialogue system. A gesture agent deciphers these descriptions and queries about the interaction context (e.g., interface, history, gaze data), which a context agent organizes and provides. Following iterative exchanges, the gesture agent discerns user intent, grounding it to an interactive function. We validated the gesture description module using public first-view and third-view gesture datasets and tested the whole system in two real-world settings: video streaming and smart home IoT control. The highest zero-shot Top-5 grounding accuracies are 80.11% for video streaming and 90.78% for smart home tasks, showing potential of the new gesture understanding paradigm.

View paper on

Share this with someone who'll enjoy it:

Title:GestureGPT: Zero-shot Interactive Gesture Understanding and Grounding with Large Language Model Agents

Paper and Code