Picture for Ron Litman

Ron Litman

DocVLM: Make Your VLM an Efficient Reader

Add code
Dec 11, 2024
Viaarxiv icon

TAP-VL: Text Layout-Aware Pre-training for Enriched Vision-Language Models

Add code
Nov 07, 2024
Figure 1 for TAP-VL: Text Layout-Aware Pre-training for Enriched Vision-Language Models
Figure 2 for TAP-VL: Text Layout-Aware Pre-training for Enriched Vision-Language Models
Figure 3 for TAP-VL: Text Layout-Aware Pre-training for Enriched Vision-Language Models
Figure 4 for TAP-VL: Text Layout-Aware Pre-training for Enriched Vision-Language Models
Viaarxiv icon

VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding

Add code
Jul 17, 2024
Figure 1 for VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding
Figure 2 for VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding
Figure 3 for VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding
Figure 4 for VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding
Viaarxiv icon

M3T: A New Benchmark Dataset for Multi-Modal Document-Level Machine Translation

Add code
Jun 12, 2024
Figure 1 for M3T: A New Benchmark Dataset for Multi-Modal Document-Level Machine Translation
Figure 2 for M3T: A New Benchmark Dataset for Multi-Modal Document-Level Machine Translation
Figure 3 for M3T: A New Benchmark Dataset for Multi-Modal Document-Level Machine Translation
Figure 4 for M3T: A New Benchmark Dataset for Multi-Modal Document-Level Machine Translation
Viaarxiv icon

Question Aware Vision Transformer for Multimodal Reasoning

Add code
Feb 08, 2024
Figure 1 for Question Aware Vision Transformer for Multimodal Reasoning
Figure 2 for Question Aware Vision Transformer for Multimodal Reasoning
Figure 3 for Question Aware Vision Transformer for Multimodal Reasoning
Figure 4 for Question Aware Vision Transformer for Multimodal Reasoning
Viaarxiv icon

GRAM: Global Reasoning for Multi-Page VQA

Add code
Jan 07, 2024
Figure 1 for GRAM: Global Reasoning for Multi-Page VQA
Figure 2 for GRAM: Global Reasoning for Multi-Page VQA
Figure 3 for GRAM: Global Reasoning for Multi-Page VQA
Figure 4 for GRAM: Global Reasoning for Multi-Page VQA
Viaarxiv icon

Towards Models that Can See and Read

Add code
Jan 18, 2023
Figure 1 for Towards Models that Can See and Read
Figure 2 for Towards Models that Can See and Read
Figure 3 for Towards Models that Can See and Read
Figure 4 for Towards Models that Can See and Read
Viaarxiv icon

CLIPTER: Looking at the Bigger Picture in Scene Text Recognition

Add code
Jan 18, 2023
Figure 1 for CLIPTER: Looking at the Bigger Picture in Scene Text Recognition
Figure 2 for CLIPTER: Looking at the Bigger Picture in Scene Text Recognition
Figure 3 for CLIPTER: Looking at the Bigger Picture in Scene Text Recognition
Figure 4 for CLIPTER: Looking at the Bigger Picture in Scene Text Recognition
Viaarxiv icon

Out-of-Vocabulary Challenge Report

Add code
Sep 14, 2022
Figure 1 for Out-of-Vocabulary Challenge Report
Figure 2 for Out-of-Vocabulary Challenge Report
Figure 3 for Out-of-Vocabulary Challenge Report
Figure 4 for Out-of-Vocabulary Challenge Report
Viaarxiv icon

Multimodal Semi-Supervised Learning for Text Recognition

Add code
May 08, 2022
Figure 1 for Multimodal Semi-Supervised Learning for Text Recognition
Figure 2 for Multimodal Semi-Supervised Learning for Text Recognition
Figure 3 for Multimodal Semi-Supervised Learning for Text Recognition
Figure 4 for Multimodal Semi-Supervised Learning for Text Recognition
Viaarxiv icon