Picture for R. Manmatha

R. Manmatha

DocKD: Knowledge Distillation from LLMs for Open-World Document Understanding Models

Add code
Oct 04, 2024
Viaarxiv icon

NAVERO: Unlocking Fine-Grained Semantics for Video-Language Compositionality

Add code
Aug 18, 2024
Figure 1 for NAVERO: Unlocking Fine-Grained Semantics for Video-Language Compositionality
Figure 2 for NAVERO: Unlocking Fine-Grained Semantics for Video-Language Compositionality
Figure 3 for NAVERO: Unlocking Fine-Grained Semantics for Video-Language Compositionality
Figure 4 for NAVERO: Unlocking Fine-Grained Semantics for Video-Language Compositionality
Viaarxiv icon

VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding

Add code
Jul 17, 2024
Figure 1 for VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding
Figure 2 for VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding
Figure 3 for VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding
Figure 4 for VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding
Viaarxiv icon

Mixed-Query Transformer: A Unified Image Segmentation Architecture

Add code
Apr 06, 2024
Figure 1 for Mixed-Query Transformer: A Unified Image Segmentation Architecture
Figure 2 for Mixed-Query Transformer: A Unified Image Segmentation Architecture
Figure 3 for Mixed-Query Transformer: A Unified Image Segmentation Architecture
Figure 4 for Mixed-Query Transformer: A Unified Image Segmentation Architecture
Viaarxiv icon

On the Scalability of Diffusion-based Text-to-Image Generation

Add code
Apr 03, 2024
Viaarxiv icon

DEED: Dynamic Early Exit on Decoder for Accelerating Encoder-Decoder Transformer Models

Add code
Nov 15, 2023
Viaarxiv icon

Multiple-Question Multiple-Answer Text-VQA

Add code
Nov 15, 2023
Viaarxiv icon

DocTr: Document Transformer for Structured Information Extraction in Documents

Add code
Jul 16, 2023
Viaarxiv icon

DocFormerv2: Local Features for Document Understanding

Add code
Jun 02, 2023
Figure 1 for DocFormerv2: Local Features for Document Understanding
Figure 2 for DocFormerv2: Local Features for Document Understanding
Figure 3 for DocFormerv2: Local Features for Document Understanding
Figure 4 for DocFormerv2: Local Features for Document Understanding
Viaarxiv icon

PolyFormer: Referring Image Segmentation as Sequential Polygon Generation

Add code
Feb 14, 2023
Viaarxiv icon