Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Finlay G. C. Hudson

Track Anything Behind Everything: Zero-Shot Amodal Video Object Segmentation

Nov 28, 2024

Finlay G. C. Hudson, William A. P. Smith

Figure 1 for Track Anything Behind Everything: Zero-Shot Amodal Video Object Segmentation

Figure 2 for Track Anything Behind Everything: Zero-Shot Amodal Video Object Segmentation

Figure 3 for Track Anything Behind Everything: Zero-Shot Amodal Video Object Segmentation

Figure 4 for Track Anything Behind Everything: Zero-Shot Amodal Video Object Segmentation

Abstract:We present Track Anything Behind Everything (TABE), a novel dataset, pipeline, and evaluation framework for zero-shot amodal completion from visible masks. Unlike existing methods that require pretrained class labels, our approach uses a single query mask from the first frame where the object is visible, enabling flexible, zero-shot inference. Our dataset, TABE-51 provides highly accurate ground truth amodal segmentation masks without the need for human estimation or 3D reconstruction. Our TABE pipeline is specifically designed to handle amodal completion, even in scenarios where objects are completely occluded. We also introduce a specialised evaluation framework that isolates amodal completion performance, free from the influence of traditional visual segmentation metrics.

Via

Access Paper or Ask Questions

If At First You Don't Succeed: Test Time Re-ranking for Zero-shot, Cross-domain Retrieval

Mar 30, 2023

Finlay G. C. Hudson, William A. P. Smith

Abstract:In this paper we propose a novel method for zero-shot, cross-domain image retrieval in which we make two key contributions. The first is a test-time re-ranking procedure that enables query-gallery pairs, without meaningful shared visual features, to be matched by incorporating gallery-gallery ranks into an iterative re-ranking process. The second is the use of cross-attention at training time and knowledge distillation to encourage cross-attention-like features to be extracted at test time from a single image. When combined with the Vision Transformer architecture and zero-shot retrieval losses, our approach yields state-of-the-art results on the Sketchy and TU-Berlin sketch-based image retrieval benchmarks. However, unlike many previous methods, none of the components in our approach are engineered specifically towards the sketch-based image retrieval task - it can be generally applied to any cross-domain, zero-shot retrieval task. We therefore also show results on zero-shot cartoon-to-photo retrieval using the Office-Home dataset.

Via

Access Paper or Ask Questions