Picture for Alexey Gritsenko

Alexey Gritsenko

PaliGemma: A versatile 3B VLM for transfer

Add code
Jul 10, 2024
Viaarxiv icon

Time-, Memory- and Parameter-Efficient Visual Adaptation

Add code
Feb 05, 2024
Viaarxiv icon

Video OWL-ViT: Temporally-consistent open-world localization in video

Add code
Aug 22, 2023
Viaarxiv icon

Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution

Add code
Jul 12, 2023
Viaarxiv icon

Scaling Open-Vocabulary Object Detection

Add code
Jun 16, 2023
Viaarxiv icon

End-to-End Spatio-Temporal Action Localisation with Video Transformers

Add code
Apr 24, 2023
Viaarxiv icon

Scaling Vision Transformers to 22 Billion Parameters

Add code
Feb 10, 2023
Viaarxiv icon

Imagen Video: High Definition Video Generation with Diffusion Models

Add code
Oct 05, 2022
Figure 1 for Imagen Video: High Definition Video Generation with Diffusion Models
Figure 2 for Imagen Video: High Definition Video Generation with Diffusion Models
Figure 3 for Imagen Video: High Definition Video Generation with Diffusion Models
Figure 4 for Imagen Video: High Definition Video Generation with Diffusion Models
Viaarxiv icon

Beyond Transfer Learning: Co-finetuning for Action Localisation

Add code
Jul 08, 2022
Figure 1 for Beyond Transfer Learning: Co-finetuning for Action Localisation
Figure 2 for Beyond Transfer Learning: Co-finetuning for Action Localisation
Figure 3 for Beyond Transfer Learning: Co-finetuning for Action Localisation
Figure 4 for Beyond Transfer Learning: Co-finetuning for Action Localisation
Viaarxiv icon

Simple Open-Vocabulary Object Detection with Vision Transformers

Add code
May 12, 2022
Figure 1 for Simple Open-Vocabulary Object Detection with Vision Transformers
Figure 2 for Simple Open-Vocabulary Object Detection with Vision Transformers
Figure 3 for Simple Open-Vocabulary Object Detection with Vision Transformers
Figure 4 for Simple Open-Vocabulary Object Detection with Vision Transformers
Viaarxiv icon