Picture for Roy Ganz

Roy Ganz

TAP-VL: Text Layout-Aware Pre-training for Enriched Vision-Language Models

Add code
Nov 07, 2024
Viaarxiv icon

Text-to-Image Generation Via Energy-Based CLIP

Add code
Aug 30, 2024
Viaarxiv icon

Adversaries With Incentives: A Strategic Alternative to Adversarial Robustness

Add code
Jun 17, 2024
Viaarxiv icon

Enhancing Consistency-Based Image Generation via Adversarialy-Trained Classification and Energy-Based Discrimination

Add code
May 25, 2024
Viaarxiv icon

Paint by Inpaint: Learning to Add Image Objects by Removing Them First

Add code
Apr 28, 2024
Viaarxiv icon

Question Aware Vision Transformer for Multimodal Reasoning

Add code
Feb 08, 2024
Viaarxiv icon

GRAM: Global Reasoning for Multi-Page VQA

Add code
Jan 07, 2024
Viaarxiv icon

CLIPAG: Towards Generator-Free Text-to-Image Generation

Add code
Jun 29, 2023
Viaarxiv icon

FuseCap: Leveraging Large Language Models to Fuse Visual Data into Enriched Image Captions

Add code
May 28, 2023
Viaarxiv icon

Classifier Robustness Enhancement Via Test-Time Transformation

Add code
Mar 27, 2023
Viaarxiv icon