Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Aljoscha Smolic

DuctTake: Spatiotemporal Video Compositing

Jan 12, 2021

Jan Rueegg, Oliver Wang, Aljoscha Smolic, Markus Gross

Abstract:DuctTake is a system designed to enable practical compositing of multiple takes of a scene into a single video. Current industry solutions are based around object segmentation, a hard problem that requires extensive manual input and cleanup, making compositing an expensive part of the film-making process. Our method instead composites shots together by finding optimal spatiotemporal seams using motion-compensated 3D graph cuts through the video volume. We describe in detail the required components, decisions, and new techniques that together make a usable, interactive tool for compositing HD video, paying special attention to running time and performance of each section. We validate our approach by presenting a wide variety of examples and by comparing result quality and creation time to composites made by professional artists using current state-of-the-art tools.

Via

Access Paper or Ask Questions

Towards Generating Ambisonics Using Audio-Visual Cue for Virtual Reality

Aug 16, 2019

Aakanksha Rana, Cagri Ozcinar, Aljoscha Smolic

Figure 1 for Towards Generating Ambisonics Using Audio-Visual Cue for Virtual Reality

Figure 2 for Towards Generating Ambisonics Using Audio-Visual Cue for Virtual Reality

Figure 3 for Towards Generating Ambisonics Using Audio-Visual Cue for Virtual Reality

Figure 4 for Towards Generating Ambisonics Using Audio-Visual Cue for Virtual Reality

Abstract:Ambisonics i.e., a full-sphere surround sound, is quintessential with 360-degree visual content to provide a realistic virtual reality (VR) experience. While 360-degree visual content capture gained a tremendous boost recently, the estimation of corresponding spatial sound is still challenging due to the required sound-field microphones or information about the sound-source locations. In this paper, we introduce a novel problem of generating Ambisonics in 360-degree videos using the audio-visual cue. With this aim, firstly, a novel 360-degree audio-visual video dataset of 265 videos is introduced with annotated sound-source locations. Secondly, a pipeline is designed for an automatic Ambisonic estimation problem. Benefiting from the deep learning-based audio-visual feature-embedding and prediction modules, our pipeline estimates the 3D sound-source locations and further use such locations to encode to the B-format. To benchmark our dataset and pipeline, we additionally propose evaluation criteria to investigate the performance using different 360-degree input representations. Our results demonstrate the efficacy of the proposed pipeline and open up a new area of research in 360-degree audio-visual analysis for future investigations.

* ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Via

Access Paper or Ask Questions