Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Learning Human Activities and Object Affordances from RGB-D Videos

May 06, 2013

Hema Swetha Koppula, Rudhir Gupta, Ashutosh Saxena

Figure 1 for Learning Human Activities and Object Affordances from RGB-D Videos

Figure 2 for Learning Human Activities and Object Affordances from RGB-D Videos

Figure 3 for Learning Human Activities and Object Affordances from RGB-D Videos

Figure 4 for Learning Human Activities and Object Affordances from RGB-D Videos

Share this with someone who'll enjoy it:

Abstract:Understanding human activities and object affordances are two very important skills, especially for personal robots which operate in human environments. In this work, we consider the problem of extracting a descriptive labeling of the sequence of sub-activities being performed by a human, and more importantly, of their interactions with the objects in the form of associated affordances. Given a RGB-D video, we jointly model the human activities and object affordances as a Markov random field where the nodes represent objects and sub-activities, and the edges represent the relationships between object affordances, their relations with sub-activities, and their evolution over time. We formulate the learning problem using a structural support vector machine (SSVM) approach, where labelings over various alternate temporal segmentations are considered as latent variables. We tested our method on a challenging dataset comprising 120 activity videos collected from 4 subjects, and obtained an accuracy of 79.4% for affordance, 63.4% for sub-activity and 75.0% for high-level activity labeling. We then demonstrate the use of such descriptive labeling in performing assistive tasks by a PR2 robot.

* arXiv admin note: substantial text overlap with arXiv:1208.0967

View paper on

Share this with someone who'll enjoy it:

Title:Learning Human Activities and Object Affordances from RGB-D Videos

Paper and Code