Picture for Felipe Maia Polo

Felipe Maia Polo

Rich Insights from Cheap Signals: Efficient Evaluations via Tensor Factorization

Add code
Mar 04, 2026
Viaarxiv icon

CARROT: A Cost Aware Rate Optimal Router

Add code
Feb 05, 2025
Figure 1 for CARROT: A Cost Aware Rate Optimal Router
Figure 2 for CARROT: A Cost Aware Rate Optimal Router
Figure 3 for CARROT: A Cost Aware Rate Optimal Router
Figure 4 for CARROT: A Cost Aware Rate Optimal Router
Viaarxiv icon

Sloth: scaling laws for LLM skills to predict multi-benchmark performance across families

Add code
Dec 09, 2024
Viaarxiv icon

Microfoundation Inference for Strategic Prediction

Add code
Nov 13, 2024
Figure 1 for Microfoundation Inference for Strategic Prediction
Figure 2 for Microfoundation Inference for Strategic Prediction
Figure 3 for Microfoundation Inference for Strategic Prediction
Figure 4 for Microfoundation Inference for Strategic Prediction
Viaarxiv icon

LiveXiv -- A Multi-Modal Live Benchmark Based on Arxiv Papers Content

Add code
Oct 15, 2024
Figure 1 for LiveXiv -- A Multi-Modal Live Benchmark Based on Arxiv Papers Content
Figure 2 for LiveXiv -- A Multi-Modal Live Benchmark Based on Arxiv Papers Content
Figure 3 for LiveXiv -- A Multi-Modal Live Benchmark Based on Arxiv Papers Content
Figure 4 for LiveXiv -- A Multi-Modal Live Benchmark Based on Arxiv Papers Content
Viaarxiv icon

Efficient multi-prompt evaluation of LLMs

Add code
May 27, 2024
Figure 1 for Efficient multi-prompt evaluation of LLMs
Figure 2 for Efficient multi-prompt evaluation of LLMs
Figure 3 for Efficient multi-prompt evaluation of LLMs
Figure 4 for Efficient multi-prompt evaluation of LLMs
Viaarxiv icon

A statistical framework for weak-to-strong generalization

Add code
May 25, 2024
Figure 1 for A statistical framework for weak-to-strong generalization
Figure 2 for A statistical framework for weak-to-strong generalization
Figure 3 for A statistical framework for weak-to-strong generalization
Figure 4 for A statistical framework for weak-to-strong generalization
Viaarxiv icon

tinyBenchmarks: evaluating LLMs with fewer examples

Add code
Feb 22, 2024
Figure 1 for tinyBenchmarks: evaluating LLMs with fewer examples
Figure 2 for tinyBenchmarks: evaluating LLMs with fewer examples
Figure 3 for tinyBenchmarks: evaluating LLMs with fewer examples
Figure 4 for tinyBenchmarks: evaluating LLMs with fewer examples
Viaarxiv icon

Estimating Fréchet bounds for validating programmatic weak supervision

Add code
Dec 07, 2023
Figure 1 for Estimating Fréchet bounds for validating programmatic weak supervision
Figure 2 for Estimating Fréchet bounds for validating programmatic weak supervision
Figure 3 for Estimating Fréchet bounds for validating programmatic weak supervision
Figure 4 for Estimating Fréchet bounds for validating programmatic weak supervision
Viaarxiv icon

Fusing Models with Complementary Expertise

Add code
Oct 02, 2023
Viaarxiv icon