Picture for Yonatan Bitton

Yonatan Bitton

Bridging the Visual Gap: Fine-Tuning Multimodal Models with Knowledge-Adapted Captions

Add code
Nov 13, 2024
Viaarxiv icon

KITTEN: A Knowledge-Intensive Evaluation of Image Generation on Visual Entities

Add code
Oct 15, 2024
Viaarxiv icon

NL-Eye: Abductive NLI for Images

Add code
Oct 03, 2024
Viaarxiv icon

Visual Riddles: a Commonsense and World Knowledge Challenge for Large Vision and Language Models

Add code
Jul 28, 2024
Viaarxiv icon

Contrastive Sequential-Diffusion Learning: An approach to Multi-Scene Instructional Video Synthesis

Add code
Jul 16, 2024
Viaarxiv icon

Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision

Add code
Jul 08, 2024
Viaarxiv icon

Beyond Thumbs Up/Down: Untangling Challenges of Fine-Grained Feedback for Text-to-Image Generation

Add code
Jun 24, 2024
Viaarxiv icon

DataComp-LM: In search of the next generation of training sets for language models

Add code
Jun 18, 2024
Figure 1 for DataComp-LM: In search of the next generation of training sets for language models
Figure 2 for DataComp-LM: In search of the next generation of training sets for language models
Figure 3 for DataComp-LM: In search of the next generation of training sets for language models
Figure 4 for DataComp-LM: In search of the next generation of training sets for language models
Viaarxiv icon

VideoPhy: Evaluating Physical Commonsense for Video Generation

Add code
Jun 05, 2024
Figure 1 for VideoPhy: Evaluating Physical Commonsense for Video Generation
Figure 2 for VideoPhy: Evaluating Physical Commonsense for Video Generation
Figure 3 for VideoPhy: Evaluating Physical Commonsense for Video Generation
Figure 4 for VideoPhy: Evaluating Physical Commonsense for Video Generation
Viaarxiv icon

Generating Coherent Sequences of Visual Illustrations for Real-World Manual Tasks

Add code
May 16, 2024
Viaarxiv icon