Picture for Pavan Kumar Anasosalu Vasu

Pavan Kumar Anasosalu Vasu

Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions

Add code
Jul 09, 2024
Figure 1 for Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions
Figure 2 for Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions
Figure 3 for Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions
Figure 4 for Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions
Viaarxiv icon

Dataset Decomposition: Faster LLM Training with Variable Sequence Length Curriculum

Add code
May 21, 2024
Figure 1 for Dataset Decomposition: Faster LLM Training with Variable Sequence Length Curriculum
Figure 2 for Dataset Decomposition: Faster LLM Training with Variable Sequence Length Curriculum
Figure 3 for Dataset Decomposition: Faster LLM Training with Variable Sequence Length Curriculum
Figure 4 for Dataset Decomposition: Faster LLM Training with Variable Sequence Length Curriculum
Viaarxiv icon

CLIP with Quality Captions: A Strong Pretraining for Vision Tasks

Add code
May 14, 2024
Figure 1 for CLIP with Quality Captions: A Strong Pretraining for Vision Tasks
Figure 2 for CLIP with Quality Captions: A Strong Pretraining for Vision Tasks
Figure 3 for CLIP with Quality Captions: A Strong Pretraining for Vision Tasks
Figure 4 for CLIP with Quality Captions: A Strong Pretraining for Vision Tasks
Viaarxiv icon

MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training

Add code
Nov 28, 2023
Figure 1 for MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training
Figure 2 for MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training
Figure 3 for MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training
Figure 4 for MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training
Viaarxiv icon

SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding

Add code
Oct 23, 2023
Figure 1 for SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding
Figure 2 for SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding
Figure 3 for SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding
Figure 4 for SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding
Viaarxiv icon

FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization

Add code
Mar 24, 2023
Viaarxiv icon

An Improved One millisecond Mobile Backbone

Add code
Jun 08, 2022
Figure 1 for An Improved One millisecond Mobile Backbone
Figure 2 for An Improved One millisecond Mobile Backbone
Figure 3 for An Improved One millisecond Mobile Backbone
Figure 4 for An Improved One millisecond Mobile Backbone
Viaarxiv icon

Forward Compatible Training for Representation Learning

Add code
Dec 06, 2021
Figure 1 for Forward Compatible Training for Representation Learning
Figure 2 for Forward Compatible Training for Representation Learning
Figure 3 for Forward Compatible Training for Representation Learning
Figure 4 for Forward Compatible Training for Representation Learning
Viaarxiv icon

Instance-Level Task Parameters: A Robust Multi-task Weighting Framework

Add code
Jun 11, 2021
Figure 1 for Instance-Level Task Parameters: A Robust Multi-task Weighting Framework
Figure 2 for Instance-Level Task Parameters: A Robust Multi-task Weighting Framework
Figure 3 for Instance-Level Task Parameters: A Robust Multi-task Weighting Framework
Figure 4 for Instance-Level Task Parameters: A Robust Multi-task Weighting Framework
Viaarxiv icon