Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Oscar Vikström

Learning Explicit Object-Centric Representations with Vision Transformers

Oct 25, 2022

Oscar Vikström, Alexander Ilin

Figure 1 for Learning Explicit Object-Centric Representations with Vision Transformers

Figure 2 for Learning Explicit Object-Centric Representations with Vision Transformers

Figure 3 for Learning Explicit Object-Centric Representations with Vision Transformers

Figure 4 for Learning Explicit Object-Centric Representations with Vision Transformers

Abstract:With the recent successful adaptation of transformers to the vision domain, particularly when trained in a self-supervised fashion, it has been shown that vision transformers can learn impressive object-reasoning-like behaviour and features expressive for the task of object segmentation in images. In this paper, we build on the self-supervision task of masked autoencoding and explore its effectiveness for explicitly learning object-centric representations with transformers. To this end, we design an object-centric autoencoder using transformers only and train it end-to-end to reconstruct full images from unmasked patches. We show that the model efficiently learns to decompose simple scenes as measured by segmentation metrics on several multi-object benchmarks.

Via

Access Paper or Ask Questions