Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:AutoAD II: The Sequel -- Who, When, and What in Movie Audio Description

Oct 10, 2023

Tengda Han, Max Bain, Arsha Nagrani, Gül Varol, Weidi Xie, Andrew Zisserman

Figure 1 for AutoAD II: The Sequel -- Who, When, and What in Movie Audio Description

Figure 2 for AutoAD II: The Sequel -- Who, When, and What in Movie Audio Description

Figure 3 for AutoAD II: The Sequel -- Who, When, and What in Movie Audio Description

Figure 4 for AutoAD II: The Sequel -- Who, When, and What in Movie Audio Description

Share this with someone who'll enjoy it:

Abstract:Audio Description (AD) is the task of generating descriptions of visual content, at suitable time intervals, for the benefit of visually impaired audiences. For movies, this presents notable challenges -- AD must occur only during existing pauses in dialogue, should refer to characters by name, and ought to aid understanding of the storyline as a whole. To this end, we develop a new model for automatically generating movie AD, given CLIP visual features of the frames, the cast list, and the temporal locations of the speech; addressing all three of the 'who', 'when', and 'what' questions: (i) who -- we introduce a character bank consisting of the character's name, the actor that played the part, and a CLIP feature of their face, for the principal cast of each movie, and demonstrate how this can be used to improve naming in the generated AD; (ii) when -- we investigate several models for determining whether an AD should be generated for a time interval or not, based on the visual content of the interval and its neighbours; and (iii) what -- we implement a new vision-language model for this task, that can ingest the proposals from the character bank, whilst conditioning on the visual features using cross-attention, and demonstrate how this improves over previous architectures for AD text generation in an apples-to-apples comparison.

* ICCV2023. Project page: https://www.robots.ox.ac.uk/vgg/research/autoad/

View paper on

Share this with someone who'll enjoy it:

Title:AutoAD II: The Sequel -- Who, When, and What in Movie Audio Description

Paper and Code