Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hannah Halin Kim

Unsupervised Flow Refinement near Motion Boundaries

Aug 03, 2022

Shuzhi Yu, Hannah Halin Kim, Shuai Yuan, Carlo Tomasi

Figure 1 for Unsupervised Flow Refinement near Motion Boundaries

Figure 2 for Unsupervised Flow Refinement near Motion Boundaries

Figure 3 for Unsupervised Flow Refinement near Motion Boundaries

Figure 4 for Unsupervised Flow Refinement near Motion Boundaries

Abstract:Unsupervised optical flow estimators based on deep learning have attracted increasing attention due to the cost and difficulty of annotating for ground truth. Although performance measured by average End-Point Error (EPE) has improved over the years, flow estimates are still poorer along motion boundaries (MBs), where the flow is not smooth, as is typically assumed, and where features computed by neural networks are contaminated by multiple motions. To improve flow in the unsupervised settings, we design a framework that detects MBs by analyzing visual changes along boundary candidates and replaces motions close to detections with motions farther away. Our proposed algorithm detects boundaries more accurately than a baseline method with the same inputs and can improve estimates from any flow predictor without additional training.

Via

Access Paper or Ask Questions

Cross-Attention Transformer for Video Interpolation

Jul 08, 2022

Hannah Halin Kim, Shuzhi Yu, Shuai Yuan, Carlo Tomasi

Figure 1 for Cross-Attention Transformer for Video Interpolation

Figure 2 for Cross-Attention Transformer for Video Interpolation

Figure 3 for Cross-Attention Transformer for Video Interpolation

Figure 4 for Cross-Attention Transformer for Video Interpolation

Abstract:We propose TAIN (Transformers and Attention for video INterpolation), a residual neural network for video interpolation, which aims to interpolate an intermediate frame given two consecutive image frames around it. We first present a novel visual transformer module, named Cross-Similarity (CS), to globally aggregate input image features with similar appearance as those of the predicted interpolated frame. These CS features are then used to refine the interpolated prediction. To account for occlusions in the CS features, we propose an Image Attention (IA) module to allow the network to focus on CS features from one frame over those of the other. Additionally, we augment our training dataset with an occluder patch that moves across frames to improve the network's robustness to occlusions and large motion. Because existing methods yield smooth predictions especially near MBs, we use an additional training loss based on image gradient to yield sharper predictions. TAIN outperforms existing methods that do not require flow estimation and performs comparably to flow-based methods while being computationally efficient in terms of inference time on Vimeo90k, UCF101, and SNU-FILM benchmarks.

Via

Access Paper or Ask Questions

Joint Detection of Motion Boundaries and Occlusions

Nov 01, 2021

Hannah Halin Kim, Shuzhi Yu, Carlo Tomasi

Figure 1 for Joint Detection of Motion Boundaries and Occlusions

Figure 2 for Joint Detection of Motion Boundaries and Occlusions

Figure 3 for Joint Detection of Motion Boundaries and Occlusions

Figure 4 for Joint Detection of Motion Boundaries and Occlusions

Abstract:We propose MONet, a convolutional neural network that jointly detects motion boundaries (MBs) and occlusion regions (Occs) in video both forward and backward in time. Detection is difficult because optical flow is discontinuous along MBs and undefined in Occs, while many flow estimators assume smoothness and a flow defined everywhere. To reason in the two time directions simultaneously, we direct-warp the estimated maps between the two frames. Since appearance mismatches between frames often signal vicinity to MBs or Occs, we construct a cost block that for each feature in one frame records the lowest discrepancy with matching features in a search range. This cost block is two-dimensional, and much less expensive than the four-dimensional cost volumes used in flow analysis. Cost-block features are computed by an encoder, and MB and Occ estimates are computed by a decoder. We found that arranging decoder layers fine-to-coarse, rather than coarse-to-fine, improves performance. MONet outperforms the prior state of the art for both tasks on the Sintel and FlyingChairsOcc benchmarks without any fine-tuning on them.

* The British Machine Vision Conference (BMVC), 2021

Via

Access Paper or Ask Questions