Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Epic-Sounds: A Large-scale Dataset of Actions That Sound

Feb 01, 2023

Jaesung Huh, Jacob Chalk, Evangelos Kazakos, Dima Damen, Andrew Zisserman

Figure 1 for Epic-Sounds: A Large-scale Dataset of Actions That Sound

Figure 2 for Epic-Sounds: A Large-scale Dataset of Actions That Sound

Figure 3 for Epic-Sounds: A Large-scale Dataset of Actions That Sound

Figure 4 for Epic-Sounds: A Large-scale Dataset of Actions That Sound

Share this with someone who'll enjoy it:

Abstract:We introduce EPIC-SOUNDS, a large-scale dataset of audio annotations capturing temporal extents and class labels within the audio stream of the egocentric videos. We propose an annotation pipeline where annotators temporally label distinguishable audio segments and describe the action that could have caused this sound. We identify actions that can be discriminated purely from audio, through grouping these free-form descriptions of audio into classes. For actions that involve objects colliding, we collect human annotations of the materials of these objects (e.g. a glass object being placed on a wooden surface), which we verify from visual labels, discarding ambiguities. Overall, EPIC-SOUNDS includes 78.4k categorised segments of audible events and actions, distributed across 44 classes as well as 39.2k non-categorised segments. We train and evaluate two state-of-the-art audio recognition models on our dataset, highlighting the importance of audio-only labels and the limitations of current models to recognise actions that sound.

* 6 pages, 4 figures

View paper on

Share this with someone who'll enjoy it:

Title:Epic-Sounds: A Large-scale Dataset of Actions That Sound

Paper and Code