Picture for Alexey Dosovitskiy

Alexey Dosovitskiy

Moving Off-the-Grid: Scene-Grounded Video Representations

Add code
Nov 08, 2024
Viaarxiv icon

ReNO: Enhancing One-step Text-to-Image Models through Reward-based Noise Optimization

Add code
Jun 06, 2024
Figure 1 for ReNO: Enhancing One-step Text-to-Image Models through Reward-based Noise Optimization
Figure 2 for ReNO: Enhancing One-step Text-to-Image Models through Reward-based Noise Optimization
Figure 3 for ReNO: Enhancing One-step Text-to-Image Models through Reward-based Noise Optimization
Figure 4 for ReNO: Enhancing One-step Text-to-Image Models through Reward-based Noise Optimization
Viaarxiv icon

Simple Open-Vocabulary Object Detection with Vision Transformers

Add code
May 12, 2022
Figure 1 for Simple Open-Vocabulary Object Detection with Vision Transformers
Figure 2 for Simple Open-Vocabulary Object Detection with Vision Transformers
Figure 3 for Simple Open-Vocabulary Object Detection with Vision Transformers
Figure 4 for Simple Open-Vocabulary Object Detection with Vision Transformers
Viaarxiv icon

Scene Representation Transformer: Geometry-Free Novel View Synthesis Through Set-Latent Scene Representations

Add code
Nov 29, 2021
Figure 1 for Scene Representation Transformer: Geometry-Free Novel View Synthesis Through Set-Latent Scene Representations
Figure 2 for Scene Representation Transformer: Geometry-Free Novel View Synthesis Through Set-Latent Scene Representations
Figure 3 for Scene Representation Transformer: Geometry-Free Novel View Synthesis Through Set-Latent Scene Representations
Figure 4 for Scene Representation Transformer: Geometry-Free Novel View Synthesis Through Set-Latent Scene Representations
Viaarxiv icon

Conditional Object-Centric Learning from Video

Add code
Nov 24, 2021
Figure 1 for Conditional Object-Centric Learning from Video
Figure 2 for Conditional Object-Centric Learning from Video
Figure 3 for Conditional Object-Centric Learning from Video
Figure 4 for Conditional Object-Centric Learning from Video
Viaarxiv icon

Do Vision Transformers See Like Convolutional Neural Networks?

Add code
Aug 19, 2021
Figure 1 for Do Vision Transformers See Like Convolutional Neural Networks?
Figure 2 for Do Vision Transformers See Like Convolutional Neural Networks?
Figure 3 for Do Vision Transformers See Like Convolutional Neural Networks?
Figure 4 for Do Vision Transformers See Like Convolutional Neural Networks?
Viaarxiv icon

MLP-Mixer: An all-MLP Architecture for Vision

Add code
May 17, 2021
Figure 1 for MLP-Mixer: An all-MLP Architecture for Vision
Figure 2 for MLP-Mixer: An all-MLP Architecture for Vision
Figure 3 for MLP-Mixer: An all-MLP Architecture for Vision
Figure 4 for MLP-Mixer: An all-MLP Architecture for Vision
Viaarxiv icon

Differentiable Patch Selection for Image Recognition

Add code
Apr 07, 2021
Figure 1 for Differentiable Patch Selection for Image Recognition
Figure 2 for Differentiable Patch Selection for Image Recognition
Figure 3 for Differentiable Patch Selection for Image Recognition
Figure 4 for Differentiable Patch Selection for Image Recognition
Viaarxiv icon

Learning Object-Centric Video Models by Contrasting Sets

Add code
Nov 20, 2020
Figure 1 for Learning Object-Centric Video Models by Contrasting Sets
Figure 2 for Learning Object-Centric Video Models by Contrasting Sets
Figure 3 for Learning Object-Centric Video Models by Contrasting Sets
Figure 4 for Learning Object-Centric Video Models by Contrasting Sets
Viaarxiv icon

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Add code
Oct 22, 2020
Figure 1 for An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Figure 2 for An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Figure 3 for An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Figure 4 for An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Viaarxiv icon