Picture for Michael Saxon

Michael Saxon

Benchmarks as Microscopes: A Call for Model Metrology

Add code
Jul 22, 2024
Viaarxiv icon

VSP: Assessing the dual challenges of perception and reasoning in spatial planning tasks for VLMs

Add code
Jul 02, 2024
Viaarxiv icon

Losing Visual Needles in Image Haystacks: Vision Language Models are Easily Distracted in Short and Long Contexts

Add code
Jun 24, 2024
Viaarxiv icon

TC-Bench: Benchmarking Temporal Compositionality in Text-to-Video and Image-to-Video Generation

Add code
Jun 12, 2024
Viaarxiv icon

Who Evaluates the Evaluations? Objectively Scoring Text-to-Image Prompt Coherence Metrics with T2IScoreScore (TS2)

Add code
Apr 05, 2024
Viaarxiv icon

Lost in Translation? Translation Errors and Challenges for Fair Assessment of Text-to-Image Models on Multilingual Concepts

Add code
Mar 17, 2024
Viaarxiv icon

Automatically Correcting Large Language Models: Surveying the landscape of diverse self-correction strategies

Add code
Aug 06, 2023
Viaarxiv icon

Multilingual Conceptual Coverage in Text-to-Image Models

Add code
Jun 02, 2023
Viaarxiv icon

Let's Think Frame by Frame: Evaluating Video Chain of Thought with Video Infilling and Prediction

Add code
May 23, 2023
Viaarxiv icon

Data Augmentation for Diverse Voice Conversion in Noisy Environments

Add code
May 18, 2023
Viaarxiv icon