Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Understanding Visual Concepts Across Models

Jun 11, 2024

Brandon Trabucco, Max Gurinas, Kyle Doherty, Ruslan Salakhutdinov

Figure 1 for Understanding Visual Concepts Across Models

Figure 2 for Understanding Visual Concepts Across Models

Figure 3 for Understanding Visual Concepts Across Models

Figure 4 for Understanding Visual Concepts Across Models

Share this with someone who'll enjoy it:

Abstract:Large multimodal models such as Stable Diffusion can generate, detect, and classify new visual concepts after fine-tuning just a single word embedding. Do models learn similar words for the same concepts (i.e. <orange-cat> = orange + cat)? We conduct a large-scale analysis on three state-of-the-art models in text-to-image generation, open-set object detection, and zero-shot classification, and find that new word embeddings are model-specific and non-transferable. Across 4,800 new embeddings trained for 40 diverse visual concepts on four standard datasets, we find perturbations within an $\epsilon$-ball to any prior embedding that generate, detect, and classify an arbitrary concept. When these new embeddings are spliced into new models, fine-tuning that targets the original model is lost. We show popular soft prompt-tuning approaches find these perturbative solutions when applied to visual concept learning tasks, and embeddings for visual concepts are not transferable. Code for reproducing our work is available at: https://visual-words.github.io.

* Official code at: https://github.com/visual-words/visual-words

View paper on

Share this with someone who'll enjoy it:

Title:Understanding Visual Concepts Across Models

Paper and Code