Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ofir Bibi

LTX-Video: Realtime Video Latent Diffusion

Dec 30, 2024

Yoav HaCohen, Nisan Chiprut, Benny Brazowski, Daniel Shalem, Dudu Moshe, Eitan Richardson, Eran Levin, Guy Shiran, Nir Zabari, Ori Gordon(+6 more)

Figure 1 for LTX-Video: Realtime Video Latent Diffusion

Figure 2 for LTX-Video: Realtime Video Latent Diffusion

Figure 3 for LTX-Video: Realtime Video Latent Diffusion

Figure 4 for LTX-Video: Realtime Video Latent Diffusion

Abstract:We introduce LTX-Video, a transformer-based latent diffusion model that adopts a holistic approach to video generation by seamlessly integrating the responsibilities of the Video-VAE and the denoising transformer. Unlike existing methods, which treat these components as independent, LTX-Video aims to optimize their interaction for improved efficiency and quality. At its core is a carefully designed Video-VAE that achieves a high compression ratio of 1:192, with spatiotemporal downscaling of 32 x 32 x 8 pixels per token, enabled by relocating the patchifying operation from the transformer's input to the VAE's input. Operating in this highly compressed latent space enables the transformer to efficiently perform full spatiotemporal self-attention, which is essential for generating high-resolution videos with temporal consistency. However, the high compression inherently limits the representation of fine details. To address this, our VAE decoder is tasked with both latent-to-pixel conversion and the final denoising step, producing the clean result directly in pixel space. This approach preserves the ability to generate fine details without incurring the runtime cost of a separate upsampling module. Our model supports diverse use cases, including text-to-video and image-to-video generation, with both capabilities trained simultaneously. It achieves faster-than-real-time generation, producing 5 seconds of 24 fps video at 768x512 resolution in just 2 seconds on an Nvidia H100 GPU, outperforming all existing models of similar scale. The source code and pre-trained models are publicly available, setting a new benchmark for accessible and scalable video generation.

Via

Access Paper or Ask Questions

Temporally stable video segmentation without video annotations

Oct 17, 2021

Aharon Azulay, Tavi Halperin, Orestis Vantzos, Nadav Bornstein, Ofir Bibi

Figure 1 for Temporally stable video segmentation without video annotations

Figure 2 for Temporally stable video segmentation without video annotations

Figure 3 for Temporally stable video segmentation without video annotations

Figure 4 for Temporally stable video segmentation without video annotations

Abstract:Temporally consistent dense video annotations are scarce and hard to collect. In contrast, image segmentation datasets (and pre-trained models) are ubiquitous, and easier to label for any novel task. In this paper, we introduce a method to adapt still image segmentation models to video in an unsupervised manner, by using an optical flow-based consistency measure. To ensure that the inferred segmented videos appear more stable in practice, we verify that the consistency measure is well correlated with human judgement via a user study. Training a new multi-input multi-output decoder using this measure as a loss, together with a technique for refining current image segmentation datasets and a temporal weighted-guided filter, we observe stability improvements in the generated segmented videos with minimal loss of accuracy.

Via

Access Paper or Ask Questions

Endless Loops: Detecting and Animating Periodic Patterns in Still Images

May 19, 2021

Tavi Halperin, Hanit Hakim, Orestis Vantzos, Gershon Hochman, Netai Benaim, Lior Sassy, Michael Kupchik, Ofir Bibi, Ohad Fried

Figure 1 for Endless Loops: Detecting and Animating Periodic Patterns in Still Images

Figure 2 for Endless Loops: Detecting and Animating Periodic Patterns in Still Images

Figure 3 for Endless Loops: Detecting and Animating Periodic Patterns in Still Images

Figure 4 for Endless Loops: Detecting and Animating Periodic Patterns in Still Images

Abstract:We present an algorithm for producing a seamless animated loop from a single image. The algorithm detects periodic structures, such as the windows of a building or the steps of a staircase, and generates a non-trivial displacement vector field that maps each segment of the structure onto a neighboring segment along a user- or auto-selected main direction of motion. This displacement field is used, together with suitable temporal and spatial smoothing, to warp the image and produce the frames of a continuous animation loop. Our cinemagraphs are created in under a second on a mobile device. Over 140,000 users downloaded our app and exported over 350,000 cinemagraphs. Moreover, we conducted two user studies that show that users prefer our method for creating surreal and structured cinemagraphs compared to more manual approaches and compared to previous methods.

* ACM Trans. Graph., Vol. 40, No. 4, Article 142. Publication date: August 2021
* SIGGRAPH 2021. Project page: https://pub.res.lightricks.com/endless-loops/ . Video: https://youtu.be/8ZYUvxWuD2Y

Via

Access Paper or Ask Questions

Clear Skies Ahead: Towards Real-Time Automatic Sky Replacement in Video

Mar 06, 2019

Tavi Halperin, Harel Cain, Ofir Bibi, Michael Werman

Figure 1 for Clear Skies Ahead: Towards Real-Time Automatic Sky Replacement in Video

Figure 2 for Clear Skies Ahead: Towards Real-Time Automatic Sky Replacement in Video

Figure 3 for Clear Skies Ahead: Towards Real-Time Automatic Sky Replacement in Video

Figure 4 for Clear Skies Ahead: Towards Real-Time Automatic Sky Replacement in Video

Abstract:Digital videos such as those captured by a smartphone often exhibit exposure inconsistencies, a poorly exposed sky, or simply suffer from an uninteresting or plain looking sky. Professionals may edit these videos using advanced and time-consuming tools unavailable to most users, to replace the sky with a more expressive or imaginative sky. In this work, we propose an algorithm for automatic replacement of the sky region in a video with a different sky, providing nonprofessional users with a simple yet efficient tool to seamlessly replace the sky. The method is fast, achieving close to real-time performance on mobile devices and the user's involvement can remain as limited as simply selecting the replacement sky.

* Eurographics 2019. Supplementary video: https://youtu.be/1uZ46YzX-pI

Via

Access Paper or Ask Questions