Picture for Ross Girshick

Ross Girshick

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models

Add code
Sep 25, 2024
Figure 1 for Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Figure 2 for Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Figure 3 for Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Figure 4 for Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Viaarxiv icon

SAM 2: Segment Anything in Images and Videos

Add code
Aug 01, 2024
Figure 1 for SAM 2: Segment Anything in Images and Videos
Figure 2 for SAM 2: Segment Anything in Images and Videos
Figure 3 for SAM 2: Segment Anything in Images and Videos
Figure 4 for SAM 2: Segment Anything in Images and Videos
Viaarxiv icon

PoliFormer: Scaling On-Policy RL with Transformers Results in Masterful Navigators

Add code
Jun 28, 2024
Viaarxiv icon

Segment Anything

Add code
Apr 05, 2023
Figure 1 for Segment Anything
Figure 2 for Segment Anything
Figure 3 for Segment Anything
Figure 4 for Segment Anything
Viaarxiv icon

The effectiveness of MAE pre-pretraining for billion-scale pretraining

Add code
Mar 23, 2023
Figure 1 for The effectiveness of MAE pre-pretraining for billion-scale pretraining
Figure 2 for The effectiveness of MAE pre-pretraining for billion-scale pretraining
Figure 3 for The effectiveness of MAE pre-pretraining for billion-scale pretraining
Figure 4 for The effectiveness of MAE pre-pretraining for billion-scale pretraining
Viaarxiv icon

Exploring Plain Vision Transformer Backbones for Object Detection

Add code
Mar 30, 2022
Figure 1 for Exploring Plain Vision Transformer Backbones for Object Detection
Figure 2 for Exploring Plain Vision Transformer Backbones for Object Detection
Figure 3 for Exploring Plain Vision Transformer Backbones for Object Detection
Figure 4 for Exploring Plain Vision Transformer Backbones for Object Detection
Viaarxiv icon

Revisiting Weakly Supervised Pre-Training of Visual Perception Models

Add code
Jan 20, 2022
Figure 1 for Revisiting Weakly Supervised Pre-Training of Visual Perception Models
Figure 2 for Revisiting Weakly Supervised Pre-Training of Visual Perception Models
Figure 3 for Revisiting Weakly Supervised Pre-Training of Visual Perception Models
Figure 4 for Revisiting Weakly Supervised Pre-Training of Visual Perception Models
Viaarxiv icon

Masked Autoencoders Are Scalable Vision Learners

Add code
Dec 02, 2021
Figure 1 for Masked Autoencoders Are Scalable Vision Learners
Figure 2 for Masked Autoencoders Are Scalable Vision Learners
Figure 3 for Masked Autoencoders Are Scalable Vision Learners
Figure 4 for Masked Autoencoders Are Scalable Vision Learners
Viaarxiv icon

Benchmarking Detection Transfer Learning with Vision Transformers

Add code
Nov 22, 2021
Figure 1 for Benchmarking Detection Transfer Learning with Vision Transformers
Figure 2 for Benchmarking Detection Transfer Learning with Vision Transformers
Figure 3 for Benchmarking Detection Transfer Learning with Vision Transformers
Figure 4 for Benchmarking Detection Transfer Learning with Vision Transformers
Viaarxiv icon

PyTorchVideo: A Deep Learning Library for Video Understanding

Add code
Nov 18, 2021
Figure 1 for PyTorchVideo: A Deep Learning Library for Video Understanding
Figure 2 for PyTorchVideo: A Deep Learning Library for Video Understanding
Figure 3 for PyTorchVideo: A Deep Learning Library for Video Understanding
Viaarxiv icon