Picture for Roy Ganz

Roy Ganz

DocVLM: Make Your VLM an Efficient Reader

Add code
Dec 11, 2024
Viaarxiv icon

TAP-VL: Text Layout-Aware Pre-training for Enriched Vision-Language Models

Add code
Nov 07, 2024
Figure 1 for TAP-VL: Text Layout-Aware Pre-training for Enriched Vision-Language Models
Figure 2 for TAP-VL: Text Layout-Aware Pre-training for Enriched Vision-Language Models
Figure 3 for TAP-VL: Text Layout-Aware Pre-training for Enriched Vision-Language Models
Figure 4 for TAP-VL: Text Layout-Aware Pre-training for Enriched Vision-Language Models
Viaarxiv icon

Text-to-Image Generation Via Energy-Based CLIP

Add code
Aug 30, 2024
Viaarxiv icon

Adversaries With Incentives: A Strategic Alternative to Adversarial Robustness

Add code
Jun 17, 2024
Viaarxiv icon

Enhancing Consistency-Based Image Generation via Adversarialy-Trained Classification and Energy-Based Discrimination

Add code
May 25, 2024
Figure 1 for Enhancing Consistency-Based Image Generation via Adversarialy-Trained Classification and Energy-Based Discrimination
Figure 2 for Enhancing Consistency-Based Image Generation via Adversarialy-Trained Classification and Energy-Based Discrimination
Figure 3 for Enhancing Consistency-Based Image Generation via Adversarialy-Trained Classification and Energy-Based Discrimination
Figure 4 for Enhancing Consistency-Based Image Generation via Adversarialy-Trained Classification and Energy-Based Discrimination
Viaarxiv icon

Paint by Inpaint: Learning to Add Image Objects by Removing Them First

Add code
Apr 28, 2024
Figure 1 for Paint by Inpaint: Learning to Add Image Objects by Removing Them First
Figure 2 for Paint by Inpaint: Learning to Add Image Objects by Removing Them First
Figure 3 for Paint by Inpaint: Learning to Add Image Objects by Removing Them First
Figure 4 for Paint by Inpaint: Learning to Add Image Objects by Removing Them First
Viaarxiv icon

Question Aware Vision Transformer for Multimodal Reasoning

Add code
Feb 08, 2024
Figure 1 for Question Aware Vision Transformer for Multimodal Reasoning
Figure 2 for Question Aware Vision Transformer for Multimodal Reasoning
Figure 3 for Question Aware Vision Transformer for Multimodal Reasoning
Figure 4 for Question Aware Vision Transformer for Multimodal Reasoning
Viaarxiv icon

GRAM: Global Reasoning for Multi-Page VQA

Add code
Jan 07, 2024
Figure 1 for GRAM: Global Reasoning for Multi-Page VQA
Figure 2 for GRAM: Global Reasoning for Multi-Page VQA
Figure 3 for GRAM: Global Reasoning for Multi-Page VQA
Figure 4 for GRAM: Global Reasoning for Multi-Page VQA
Viaarxiv icon

CLIPAG: Towards Generator-Free Text-to-Image Generation

Add code
Jun 29, 2023
Figure 1 for CLIPAG: Towards Generator-Free Text-to-Image Generation
Figure 2 for CLIPAG: Towards Generator-Free Text-to-Image Generation
Figure 3 for CLIPAG: Towards Generator-Free Text-to-Image Generation
Figure 4 for CLIPAG: Towards Generator-Free Text-to-Image Generation
Viaarxiv icon

FuseCap: Leveraging Large Language Models to Fuse Visual Data into Enriched Image Captions

Add code
May 28, 2023
Viaarxiv icon