Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Andrew Marmon

CamViG: Camera Aware Image-to-Video Generation with Multimodal Transformers

May 21, 2024

Andrew Marmon, Grant Schindler, José Lezama, Dan Kondratyuk, Bryan Seybold, Irfan Essa

Figure 1 for CamViG: Camera Aware Image-to-Video Generation with Multimodal Transformers

Figure 2 for CamViG: Camera Aware Image-to-Video Generation with Multimodal Transformers

Figure 3 for CamViG: Camera Aware Image-to-Video Generation with Multimodal Transformers

Figure 4 for CamViG: Camera Aware Image-to-Video Generation with Multimodal Transformers

Abstract:We extend multimodal transformers to include 3D camera motion as a conditioning signal for the task of video generation. Generative video models are becoming increasingly powerful, thus focusing research efforts on methods of controlling the output of such models. We propose to add virtual 3D camera controls to generative video methods by conditioning generated video on an encoding of three-dimensional camera movement over the course of the generated video. Results demonstrate that we are (1) able to successfully control the camera during video generation, starting from a single frame and a camera signal, and (2) we demonstrate the accuracy of the generated 3D camera paths using traditional computer vision methods.

Via

Access Paper or Ask Questions

Probabilistic Tracking with Deep Factors

Dec 02, 2021

Fan Jiang, Andrew Marmon, Ildebrando De Courten, Marc Rasi, Frank Dellaert

Figure 1 for Probabilistic Tracking with Deep Factors

Figure 2 for Probabilistic Tracking with Deep Factors

Figure 3 for Probabilistic Tracking with Deep Factors

Figure 4 for Probabilistic Tracking with Deep Factors

Abstract:In many applications of computer vision it is important to accurately estimate the trajectory of an object over time by fusing data from a number of sources, of which 2D and 3D imagery is only one. In this paper, we show how to use a deep feature encoding in conjunction with generative densities over the features in a factor-graph based, probabilistic tracking framework. We present a likelihood model that combines a learned feature encoder with generative densities over them, both trained in a supervised manner. We also experiment with directly inferring probability through the use of image classification models that feed into the likelihood formulation. These models are used to implement deep factors that are added to the factor graph to complement other factors that represent domain-specific knowledge such as motion models and/or other prior information. Factors are then optimized together in a non-linear least-squares tracking framework that takes the form of an Extended Kalman Smoother with a Gaussian prior. A key feature of our likelihood model is that it leverages the Lie group properties of the tracked target's pose to apply the feature encoding on an image patch, extracted through a differentiable warp function inspired by spatial transformer networks. To illustrate the proposed approach we evaluate it on a challenging social insect behavior dataset, and show that using deep features does outperform these earlier linear appearance models used in this setting.

Via

Access Paper or Ask Questions