Abstract:Research on video activity detection has primarily focused on identifying well-defined human activities in short video segments. The majority of the research on video activity recognition is focused on the development of large parameter systems that require training on large video datasets. This paper develops a low-parameter, modular system with rapid inferencing capabilities that can be trained entirely on limited datasets without requiring transfer learning from large-parameter systems. The system can accurately detect and associate specific activities with the students who perform the activities in real-life classroom videos. Additionally, the paper develops an interactive web-based application to visualize human activity maps over long real-life classroom videos.
Abstract:Long-term object detection requires the integration of frame-based results over several seconds. For non-deformable objects, long-term detection is often addressed using object detection followed by video tracking. Unfortunately, tracking is inapplicable to objects that undergo dramatic changes in appearance from frame to frame. As a related example, we study hand detection over long video recordings in collaborative learning environments. More specifically, we develop long-term hand detection methods that can deal with partial occlusions and dramatic changes in appearance. Our approach integrates object-detection, followed by time projections, clustering, and small region removal to provide effective hand detection over long videos. The hand detector achieved average precision (AP) of 72% at 0.5 intersection over union (IoU). The detection results were improved to 81% by using our optimized approach for data augmentation. The method runs at 4.7x the real-time with AP of 81% at 0.5 intersection over the union. Our method reduced the number of false-positive hand detections by 80% by improving IoU ratios from 0.2 to 0.5. The overall hand detection system runs at 4x real-time.