Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:On Occlusions in Video Action Detection: Benchmark Datasets And Training Recipes

Oct 25, 2024

Rajat Modi, Vibhav Vineet, Yogesh Singh Rawat

Figure 1 for On Occlusions in Video Action Detection: Benchmark Datasets And Training Recipes

Figure 2 for On Occlusions in Video Action Detection: Benchmark Datasets And Training Recipes

Figure 3 for On Occlusions in Video Action Detection: Benchmark Datasets And Training Recipes

Figure 4 for On Occlusions in Video Action Detection: Benchmark Datasets And Training Recipes

Share this with someone who'll enjoy it:

Abstract:This paper explores the impact of occlusions in video action detection. We facilitate this study by introducing five new benchmark datasets namely O-UCF and O-JHMDB consisting of synthetically controlled static/dynamic occlusions, OVIS-UCF and OVIS-JHMDB consisting of occlusions with realistic motions and Real-OUCF for occlusions in realistic-world scenarios. We formally confirm an intuitive expectation: existing models suffer a lot as occlusion severity is increased and exhibit different behaviours when occluders are static vs when they are moving. We discover several intriguing phenomenon emerging in neural nets: 1) transformers can naturally outperform CNN models which might have even used occlusion as a form of data augmentation during training 2) incorporating symbolic-components like capsules to such backbones allows them to bind to occluders never even seen during training and 3) Islands of agreement can emerge in realistic images/videos without instance-level supervision, distillation or contrastive-based objectives2(eg. video-textual training). Such emergent properties allow us to derive simple yet effective training recipes which lead to robust occlusion models inductively satisfying the first two stages of the binding mechanism (grouping/segregation). Models leveraging these recipes outperform existing video action-detectors under occlusion by 32.3% on O-UCF, 32.7% on O-JHMDB & 2.6% on Real-OUCF in terms of the vMAP metric. The code for this work has been released at https://github.com/rajatmodi62/OccludedActionBenchmark.

* This paper was accepted to NeurIPS 2023 Dataset And Benchmark Track. It also showcases: Hinton's Islands of Agreement on realistic datasets which were previously hypothesized in his GLOM paper

View paper on

Share this with someone who'll enjoy it:

Title:On Occlusions in Video Action Detection: Benchmark Datasets And Training Recipes

Paper and Code