Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Bastien Vanderplaetse

Improved Soccer Action Spotting using both Audio and Video Streams

Nov 09, 2020

Bastien Vanderplaetse, Stéphane Dupont

Figure 1 for Improved Soccer Action Spotting using both Audio and Video Streams

Figure 2 for Improved Soccer Action Spotting using both Audio and Video Streams

Figure 3 for Improved Soccer Action Spotting using both Audio and Video Streams

Figure 4 for Improved Soccer Action Spotting using both Audio and Video Streams

Abstract:In this paper, we propose a study on multi-modal (audio and video) action spotting and classification in soccer videos. Action spotting and classification are the tasks that consist in finding the temporal anchors of events in a video and determine which event they are. This is an important application of general activity understanding. Here, we propose an experimental study on combining audio and video information at different stages of deep neural network architectures. We used the SoccerNet benchmark dataset, which contains annotated events for 500 soccer game videos from the Big Five European leagues. Through this work, we evaluated several ways to integrate audio stream into video-only-based architectures. We observed an average absolute improvement of the mean Average Precision (mAP) metric of $7.43\%$ for the action classification task and of $4.19\%$ for the action spotting task.

* Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2020, pp. 896-897

Via

Access Paper or Ask Questions

Can adversarial training learn image captioning ?

Oct 31, 2019

Jean-Benoit Delbrouck, Bastien Vanderplaetse, Stéphane Dupont

Figure 1 for Can adversarial training learn image captioning ?

Figure 2 for Can adversarial training learn image captioning ?

Figure 3 for Can adversarial training learn image captioning ?

Abstract:Recently, generative adversarial networks (GAN) have gathered a lot of interest. Their efficiency in generating unseen samples of high quality, especially images, has improved over the years. In the field of Natural Language Generation (NLG), the use of the adversarial setting to generate meaningful sentences has shown to be difficult for two reasons: the lack of existing architectures to produce realistic sentences and the lack of evaluation tools. In this paper, we propose an adversarial architecture related to the conditional GAN (cGAN) that generates sentences according to a given image (also called image captioning). This attempt is the first that uses no pre-training or reinforcement methods. We also explain why our experiment settings can be safely evaluated and interpreted for further works.

* Accepted to NeurIPS 2019 ViGiL workshop

Via

Access Paper or Ask Questions