Picture for Kate Sanders

Kate Sanders

CLAIMCHECK: How Grounded are LLM Critiques of Scientific Papers?

Add code
Mar 27, 2025
Viaarxiv icon

MMMORRF: Multimodal Multilingual Modularized Reciprocal Rank Fusion

Add code
Mar 26, 2025
Viaarxiv icon

Video-ColBERT: Contextualized Late Interaction for Text-to-Video Retrieval

Add code
Mar 24, 2025
Viaarxiv icon

Randomly Sampled Language Reasoning Problems Reveal Limits of LLMs

Add code
Jan 07, 2025
Figure 1 for Randomly Sampled Language Reasoning Problems Reveal Limits of LLMs
Figure 2 for Randomly Sampled Language Reasoning Problems Reveal Limits of LLMs
Figure 3 for Randomly Sampled Language Reasoning Problems Reveal Limits of LLMs
Figure 4 for Randomly Sampled Language Reasoning Problems Reveal Limits of LLMs
Viaarxiv icon

MultiVENT 2.0: A Massive Multilingual Benchmark for Event-Centric Video Retrieval

Add code
Oct 15, 2024
Viaarxiv icon

Grounding Partially-Defined Events in Multimodal Data

Add code
Oct 07, 2024
Viaarxiv icon

Core: Robust Factual Precision Scoring with Informative Sub-Claim Identification

Add code
Jul 04, 2024
Viaarxiv icon

A Survey of Video Datasets for Grounded Event Understanding

Add code
Jun 14, 2024
Viaarxiv icon

On the Evaluation of Machine-Generated Reports

Add code
May 02, 2024
Figure 1 for On the Evaluation of Machine-Generated Reports
Figure 2 for On the Evaluation of Machine-Generated Reports
Figure 3 for On the Evaluation of Machine-Generated Reports
Figure 4 for On the Evaluation of Machine-Generated Reports
Viaarxiv icon

Tur[k]ingBench: A Challenge Benchmark for Web Agents

Add code
Mar 21, 2024
Viaarxiv icon