Picture for Mohamed Elhoseiny

Mohamed Elhoseiny

Document Haystacks: Vision-Language Reasoning Over Piles of 1000+ Documents

Add code
Nov 23, 2024
Viaarxiv icon

No Culture Left Behind: ArtELingo-28, a Benchmark of WikiArt with Captions in 28 Languages

Add code
Nov 06, 2024
Viaarxiv icon

AutoBench-V: Can Large Vision-Language Models Benchmark Themselves?

Add code
Oct 29, 2024
Viaarxiv icon

LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding

Add code
Oct 22, 2024
Viaarxiv icon

Bi-Factorial Preference Optimization: Balancing Safety-Helpfulness in Language Models

Add code
Aug 27, 2024
Viaarxiv icon

Openstory++: A Large-scale Dataset and Benchmark for Instance-aware Open-domain Visual Storytelling

Add code
Aug 07, 2024
Figure 1 for Openstory++: A Large-scale Dataset and Benchmark for Instance-aware Open-domain Visual Storytelling
Figure 2 for Openstory++: A Large-scale Dataset and Benchmark for Instance-aware Open-domain Visual Storytelling
Figure 3 for Openstory++: A Large-scale Dataset and Benchmark for Instance-aware Open-domain Visual Storytelling
Figure 4 for Openstory++: A Large-scale Dataset and Benchmark for Instance-aware Open-domain Visual Storytelling
Viaarxiv icon

How Well Can Vision Language Models See Image Details?

Add code
Aug 07, 2024
Viaarxiv icon

Goldfish: Vision-Language Understanding of Arbitrarily Long Videos

Add code
Jul 17, 2024
Viaarxiv icon

MiniGPT-Med: Large Language Model as a General Interface for Radiology Diagnosis

Add code
Jul 04, 2024
Viaarxiv icon

Meerkat: Audio-Visual Large Language Model for Grounding in Space and Time

Add code
Jul 01, 2024
Viaarxiv icon