Picture for Gabriel Stanovsky

Gabriel Stanovsky

Seeing the Forest for the Trees: A Large Scale, Continuously Updating Meta-Analysis of Frontier LLMs

Add code
Feb 26, 2025
Viaarxiv icon

WildFrame: Comparing Framing in Humans and LLMs on Naturally Occurring Texts

Add code
Feb 24, 2025
Viaarxiv icon

Trust Me, I'm Wrong: High-Certainty Hallucinations in LLMs

Add code
Feb 18, 2025
Viaarxiv icon

Beyond Benchmarks: On The False Promise of AI Regulation

Add code
Jan 26, 2025
Viaarxiv icon

Improving Image Captioning by Mimicking Human Reformulation Feedback at Inference-time

Add code
Jan 08, 2025
Figure 1 for Improving Image Captioning by Mimicking Human Reformulation Feedback at Inference-time
Figure 2 for Improving Image Captioning by Mimicking Human Reformulation Feedback at Inference-time
Figure 3 for Improving Image Captioning by Mimicking Human Reformulation Feedback at Inference-time
Figure 4 for Improving Image Captioning by Mimicking Human Reformulation Feedback at Inference-time
Viaarxiv icon

The State and Fate of Summarization Datasets

Add code
Nov 07, 2024
Viaarxiv icon

SAUCE: Synchronous and Asynchronous User-Customizable Environment for Multi-Agent LLM Interaction

Add code
Nov 05, 2024
Figure 1 for SAUCE: Synchronous and Asynchronous User-Customizable Environment for Multi-Agent LLM Interaction
Figure 2 for SAUCE: Synchronous and Asynchronous User-Customizable Environment for Multi-Agent LLM Interaction
Figure 3 for SAUCE: Synchronous and Asynchronous User-Customizable Environment for Multi-Agent LLM Interaction
Figure 4 for SAUCE: Synchronous and Asynchronous User-Customizable Environment for Multi-Agent LLM Interaction
Viaarxiv icon

Looking Beyond The Top-1: Transformers Determine Top Tokens In Order

Add code
Oct 26, 2024
Figure 1 for Looking Beyond The Top-1: Transformers Determine Top Tokens In Order
Figure 2 for Looking Beyond The Top-1: Transformers Determine Top Tokens In Order
Figure 3 for Looking Beyond The Top-1: Transformers Determine Top Tokens In Order
Figure 4 for Looking Beyond The Top-1: Transformers Determine Top Tokens In Order
Viaarxiv icon

Surveying the Landscape of Image Captioning Evaluation: A Comprehensive Taxonomy and Novel Ensemble Method

Add code
Aug 09, 2024
Viaarxiv icon

SEAM: A Stochastic Benchmark for Multi-Document Tasks

Add code
Jun 23, 2024
Viaarxiv icon