Picture for Roy Ganz

Roy Ganz

TAP-VL: Text Layout-Aware Pre-training for Enriched Vision-Language Models

Add code
Nov 07, 2024
Viaarxiv icon

Text-to-Image Generation Via Energy-Based CLIP

Add code
Aug 30, 2024
Viaarxiv icon

Adversaries With Incentives: A Strategic Alternative to Adversarial Robustness

Add code
Jun 17, 2024
Viaarxiv icon

Enhancing Consistency-Based Image Generation via Adversarialy-Trained Classification and Energy-Based Discrimination

Add code
May 25, 2024
Viaarxiv icon

Paint by Inpaint: Learning to Add Image Objects by Removing Them First

Add code
Apr 28, 2024
Viaarxiv icon

Question Aware Vision Transformer for Multimodal Reasoning

Add code
Feb 08, 2024
Figure 1 for Question Aware Vision Transformer for Multimodal Reasoning
Figure 2 for Question Aware Vision Transformer for Multimodal Reasoning
Figure 3 for Question Aware Vision Transformer for Multimodal Reasoning
Figure 4 for Question Aware Vision Transformer for Multimodal Reasoning
Viaarxiv icon

GRAM: Global Reasoning for Multi-Page VQA

Add code
Jan 07, 2024
Figure 1 for GRAM: Global Reasoning for Multi-Page VQA
Figure 2 for GRAM: Global Reasoning for Multi-Page VQA
Figure 3 for GRAM: Global Reasoning for Multi-Page VQA
Figure 4 for GRAM: Global Reasoning for Multi-Page VQA
Viaarxiv icon

CLIPAG: Towards Generator-Free Text-to-Image Generation

Add code
Jun 29, 2023
Viaarxiv icon

FuseCap: Leveraging Large Language Models to Fuse Visual Data into Enriched Image Captions

Add code
May 28, 2023
Viaarxiv icon

Classifier Robustness Enhancement Via Test-Time Transformation

Add code
Mar 27, 2023
Viaarxiv icon