Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Amy Zhao

Painting Many Pasts: Synthesizing Time Lapse Videos of Paintings

Jan 04, 2020

Amy Zhao, Guha Balakrishnan, Kathleen M. Lewis, Frédo Durand, John V. Guttag, Adrian V. Dalca

Figure 1 for Painting Many Pasts: Synthesizing Time Lapse Videos of Paintings

Figure 2 for Painting Many Pasts: Synthesizing Time Lapse Videos of Paintings

Figure 3 for Painting Many Pasts: Synthesizing Time Lapse Videos of Paintings

Figure 4 for Painting Many Pasts: Synthesizing Time Lapse Videos of Paintings

Abstract:We introduce a new video synthesis task: synthesizing time lapse videos depicting how a given painting might have been created. Artists paint using unique combinations of brushes, strokes, colors, and layers. There are often many possible ways to create a given painting. Our goal is to learn to capture this rich range of possibilities. Creating distributions of long-term videos is a challenge for learning-based video synthesis methods. We present a probabilistic model that, given a single image of a completed painting, recurrently synthesizes steps of the painting process. We implement this model as a convolutional neural network, and introduce a training scheme to facilitate learning from a limited dataset of painting time lapses. We demonstrate that this model can be used to sample many time steps, enabling long-term stochastic video synthesis. We evaluate our method on digital and watercolor paintings collected from video websites, and show that human raters find our synthesized videos to be similar to time lapses produced by real artists.

Via

Access Paper or Ask Questions

Visual Deprojection: Probabilistic Recovery of Collapsed Dimensions

Sep 01, 2019

Guha Balakrishnan, Adrian V. Dalca, Amy Zhao, John V. Guttag, Fredo Durand, William T. Freeman

Figure 1 for Visual Deprojection: Probabilistic Recovery of Collapsed Dimensions

Figure 2 for Visual Deprojection: Probabilistic Recovery of Collapsed Dimensions

Figure 3 for Visual Deprojection: Probabilistic Recovery of Collapsed Dimensions

Figure 4 for Visual Deprojection: Probabilistic Recovery of Collapsed Dimensions

Abstract:We introduce visual deprojection: the task of recovering an image or video that has been collapsed along a dimension. Projections arise in various contexts, such as long-exposure photography, where a dynamic scene is collapsed in time to produce a motion-blurred image, and corner cameras, where reflected light from a scene is collapsed along a spatial dimension because of an edge occluder to yield a 1D video. Deprojection is ill-posed-- often there are many plausible solutions for a given input. We first propose a probabilistic model capturing the ambiguity of the task. We then present a variational inference strategy using convolutional neural networks as functional approximators. Sampling from the inference network at test time yields plausible candidates from the distribution of original signals that are consistent with a given input projection. We evaluate the method on several datasets for both spatial and temporal deprojection tasks. We first demonstrate the method can recover human gait videos and face images from spatial projections, and then show that it can recover videos of moving digits from dramatically motion-blurred images obtained via temporal projection.

* ICCV 2019

Via

Access Paper or Ask Questions

Data augmentation using learned transformations for one-shot medical image segmentation

Apr 06, 2019

Amy Zhao, Guha Balakrishnan, Frédo Durand, John V. Guttag, Adrian V. Dalca

Figure 1 for Data augmentation using learned transformations for one-shot medical image segmentation

Figure 2 for Data augmentation using learned transformations for one-shot medical image segmentation

Figure 3 for Data augmentation using learned transformations for one-shot medical image segmentation

Figure 4 for Data augmentation using learned transformations for one-shot medical image segmentation

Abstract:Image segmentation is an important task in many medical applications. Methods based on convolutional neural networks attain state-of-the-art accuracy; however, they typically rely on supervised training with large labeled datasets. Labeling medical images requires significant expertise and time, and typical hand-tuned approaches for data augmentation fail to capture the complex variations in such images. We present an automated data augmentation method for synthesizing labeled medical images. We demonstrate our method on the task of segmenting magnetic resonance imaging (MRI) brain scans. Our method requires only a single segmented scan, and leverages other unlabeled scans in a semi-supervised approach. We learn a model of transformations from the images, and use the model along with the labeled example to synthesize additional labeled examples. Each transformation is comprised of a spatial deformation field and an intensity change, enabling the synthesis of complex effects such as variations in anatomy and image acquisition procedures. We show that training a supervised segmenter with these new examples provides significant improvements over state-of-the-art methods for one-shot biomedical image segmentation. Our code is available at https://github.com/xamyzhao/brainstorm.

* 9 pages, CVPR 2019

Via

Access Paper or Ask Questions

VoxelMorph: A Learning Framework for Deformable Medical Image Registration

Sep 14, 2018

Guha Balakrishnan, Amy Zhao, Mert R. Sabuncu, John Guttag, Adrian V. Dalca

Figure 1 for VoxelMorph: A Learning Framework for Deformable Medical Image Registration

Figure 2 for VoxelMorph: A Learning Framework for Deformable Medical Image Registration

Figure 3 for VoxelMorph: A Learning Framework for Deformable Medical Image Registration

Figure 4 for VoxelMorph: A Learning Framework for Deformable Medical Image Registration

Abstract:We present VoxelMorph, a fast, unsupervised, learning-based algorithm for deformable pairwise medical image registration. Traditional registration methods optimize an objective function independently for each pair of images, which is time-consuming for large datasets. We define registration as a parametric function, implemented as a convolutional neural network (CNN). We optimize its global parameters given a set of images from a collection of interest. Given a new pair of scans, VoxelMorph rapidly computes a deformation field by directly evaluating the function. Our model is flexible, enabling the use of any differentiable objective function to optimize these parameters. In this work, we propose and extensively evaluate a standard image matching objective function as well as an objective function that can use auxiliary data such as anatomical segmentations available only at training time. We demonstrate that the unsupervised model's accuracy is comparable to state-of-the-art methods, while operating orders of magnitude faster. We also show that VoxelMorph trained with auxiliary data significantly improves registration accuracy at test time. Our method promises to significantly speed up medical image analysis and processing pipelines, while facilitating novel directions in learning-based registration and its applications. Our code is freely available at voxelmorph.csail.mit.edu.

* Submitted to IEEE TMI. This manuscript expands on the CVPR 2018 paper (arXiv:1802.02604) by presenting an auxiliary learning model, an amortized optimization analysis, and more extensive model evaluations

Via

Access Paper or Ask Questions

An Unsupervised Learning Model for Deformable Medical Image Registration

Apr 20, 2018

Guha Balakrishnan, Amy Zhao, Mert R. Sabuncu, John Guttag, Adrian V. Dalca

Figure 1 for An Unsupervised Learning Model for Deformable Medical Image Registration

Figure 2 for An Unsupervised Learning Model for Deformable Medical Image Registration

Figure 3 for An Unsupervised Learning Model for Deformable Medical Image Registration

Figure 4 for An Unsupervised Learning Model for Deformable Medical Image Registration

Abstract:We present a fast learning-based algorithm for deformable, pairwise 3D medical image registration. Current registration methods optimize an objective function independently for each pair of images, which can be time-consuming for large data. We define registration as a parametric function, and optimize its parameters given a set of images from a collection of interest. Given a new pair of scans, we can quickly compute a registration field by directly evaluating the function using the learned parameters. We model this function using a convolutional neural network (CNN), and use a spatial transform layer to reconstruct one image from another while imposing smoothness constraints on the registration field. The proposed method does not require supervised information such as ground truth registration fields or anatomical landmarks. We demonstrate registration accuracy comparable to state-of-the-art 3D image registration, while operating orders of magnitude faster in practice. Our method promises to significantly speed up medical image analysis and processing pipelines, while facilitating novel directions in learning-based registration and its applications. Our code is available at https://github.com/balakg/voxelmorph .

* 9 pages, in CVPR 2018

Via

Access Paper or Ask Questions

Synthesizing Images of Humans in Unseen Poses

Apr 20, 2018

Guha Balakrishnan, Amy Zhao, Adrian V. Dalca, Fredo Durand, John Guttag

Figure 1 for Synthesizing Images of Humans in Unseen Poses

Figure 2 for Synthesizing Images of Humans in Unseen Poses

Figure 3 for Synthesizing Images of Humans in Unseen Poses

Figure 4 for Synthesizing Images of Humans in Unseen Poses

Abstract:We address the computational problem of novel human pose synthesis. Given an image of a person and a desired pose, we produce a depiction of that person in that pose, retaining the appearance of both the person and background. We present a modular generative neural network that synthesizes unseen poses using training pairs of images and poses taken from human action videos. Our network separates a scene into different body part and background layers, moves body parts to new locations and refines their appearances, and composites the new foreground with a hole-filled background. These subtasks, implemented with separate modules, are trained jointly using only a single target image as a supervised label. We use an adversarial discriminator to force our network to synthesize realistic details conditioned on pose. We demonstrate image synthesis results on three action classes: golf, yoga/workouts and tennis, and show that our method produces accurate results within action classes as well as across action classes. Given a sequence of desired poses, we also produce coherent videos of actions.

* CVPR 2018

Via

Access Paper or Ask Questions

A Video-Based Method for Objectively Rating Ataxia

Sep 07, 2017

Ronnachai Jaroensri, Amy Zhao, Guha Balakrishnan, Derek Lo, Jeremy Schmahmann, John Guttag, Fredo Durand

Figure 1 for A Video-Based Method for Objectively Rating Ataxia

Figure 2 for A Video-Based Method for Objectively Rating Ataxia

Figure 3 for A Video-Based Method for Objectively Rating Ataxia

Figure 4 for A Video-Based Method for Objectively Rating Ataxia

Abstract:For many movement disorders, such as Parkinson's disease and ataxia, disease progression is visually assessed by a clinician using a numerical disease rating scale. These tests are subjective, time-consuming, and must be administered by a professional. This can be problematic where specialists are not available, or when a patient is not consistently evaluated by the same clinician. We present an automated method for quantifying the severity of motion impairment in patients with ataxia, using only video recordings. We consider videos of the finger-to-nose test, a common movement task used as part of the assessment of ataxia progression during the course of routine clinical checkups. Our method uses neural network-based pose estimation and optical flow techniques to track the motion of the patient's hand in a video recording. We extract features that describe qualities of the motion such as speed and variation in performance. Using labels provided by an expert clinician, we train a supervised learning model that predicts severity according to the Brief Ataxia Rating Scale (BARS). The performance of our system is comparable to that of a group of ataxia specialists in terms of mean error and correlation, and our system's predictions were consistently within the range of inter-rater variability. This work demonstrates the feasibility of using computer vision and machine learning to produce consistent and clinically useful measures of motor impairment.

* MLHC 2017

Via

Access Paper or Ask Questions