Picture for Sachin Mehta

Sachin Mehta

Efficient Vision-Language Models by Summarizing Visual Tokens into Compact Registers

Add code
Oct 17, 2024
Viaarxiv icon

CtrlSynth: Controllable Image Text Synthesis for Data-Efficient Multimodal Learning

Add code
Oct 15, 2024
Viaarxiv icon

SeedLM: Compressing LLM Weights into Seeds of Pseudo-Random Generators

Add code
Oct 14, 2024
Viaarxiv icon

KV Prediction for Improved Time to First Token

Add code
Oct 10, 2024
Figure 1 for KV Prediction for Improved Time to First Token
Figure 2 for KV Prediction for Improved Time to First Token
Figure 3 for KV Prediction for Improved Time to First Token
Figure 4 for KV Prediction for Improved Time to First Token
Viaarxiv icon

PathwayBench: Assessing Routability of Pedestrian Pathway Networks Inferred from Multi-City Imagery

Add code
Jul 23, 2024
Figure 1 for PathwayBench: Assessing Routability of Pedestrian Pathway Networks Inferred from Multi-City Imagery
Figure 2 for PathwayBench: Assessing Routability of Pedestrian Pathway Networks Inferred from Multi-City Imagery
Figure 3 for PathwayBench: Assessing Routability of Pedestrian Pathway Networks Inferred from Multi-City Imagery
Figure 4 for PathwayBench: Assessing Routability of Pedestrian Pathway Networks Inferred from Multi-City Imagery
Viaarxiv icon

LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference

Add code
Jul 19, 2024
Viaarxiv icon

OpenELM: An Efficient Language Model Family with Open Training and Inference Framework

Add code
May 02, 2024
Viaarxiv icon

CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data

Add code
Apr 24, 2024
Figure 1 for CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data
Figure 2 for CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data
Figure 3 for CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data
Figure 4 for CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data
Viaarxiv icon

Weight subcloning: direct initialization of transformers using larger pretrained ones

Add code
Dec 14, 2023
Viaarxiv icon

Label-efficient Training of Small Task-specific Models by Leveraging Vision Foundation Models

Add code
Nov 30, 2023
Figure 1 for Label-efficient Training of Small Task-specific Models by Leveraging Vision Foundation Models
Figure 2 for Label-efficient Training of Small Task-specific Models by Leveraging Vision Foundation Models
Figure 3 for Label-efficient Training of Small Task-specific Models by Leveraging Vision Foundation Models
Figure 4 for Label-efficient Training of Small Task-specific Models by Leveraging Vision Foundation Models
Viaarxiv icon