Abstract:There is lots of scientific work about object detection in images. For many applications like for example autonomous driving the actual data on which classification has to be done are videos. This work compares different methods, especially those which use Recurrent Neural Networks to detect objects in videos. We differ between feature-based methods, which feed feature maps of different frames into the recurrent units, box-level methods, which feed bounding boxes with class probabilities into the recurrent units and methods which use flow networks. This study indicates common outcomes of the compared methods like the benefit of including the temporal context into object detection and states conclusions and guidelines for video object detection networks.