Abstract:This paper is about detecting functional objects and inferring human intentions in surveillance videos of public spaces. People in the videos are expected to intentionally take shortest paths toward functional objects subject to obstacles, where people can satisfy certain needs (e.g., a vending machine can quench thirst), by following one of three possible intent behaviors: reach a single functional object and stop, or sequentially visit several functional objects, or initially start moving toward one goal but then change the intent to move toward another. Since detecting functional objects in low-resolution surveillance videos is typically unreliable, we call them "dark matter" characterized by the functionality to attract people. We formulate the Agent-based Lagrangian Mechanics wherein human trajectories are probabilistically modeled as motions of agents in many layers of "dark-energy" fields, where each agent can select a particular force field to affect its motions, and thus define the minimum-energy Dijkstra path toward the corresponding source "dark matter". For evaluation, we compiled and annotated a new dataset. The results demonstrate our effectiveness in predicting human intent behaviors and trajectories, and localizing functional objects, as well as discovering distinct functional classes of objects by clustering human motion behavior in the vicinity of functional objects.
Abstract:With the advent of drones, aerial video analysis becomes increasingly important; yet, it has received scant attention in the literature. This paper addresses a new problem of parsing low-resolution aerial videos of large spatial areas, in terms of 1) grouping, 2) recognizing events and 3) assigning roles to people engaged in events. We propose a novel framework aimed at conducting joint inference of the above tasks, as reasoning about each in isolation typically fails in our setting. Given noisy tracklets of people and detections of large objects and scene surfaces (e.g., building, grass), we use a spatiotemporal AND-OR graph to drive our joint inference, using Markov Chain Monte Carlo and dynamic programming. We also introduce a new formalism of spatiotemporal templates characterizing latent sub-events. For evaluation, we have collected and released a new aerial videos dataset using a hex-rotor flying over picnic areas rich with group events. Our results demonstrate that we successfully address above inference tasks under challenging conditions.