Picture for Felipe Maia Polo

Felipe Maia Polo

Sloth: scaling laws for LLM skills to predict multi-benchmark performance across families

Add code
Dec 09, 2024
Viaarxiv icon

Microfoundation Inference for Strategic Prediction

Add code
Nov 13, 2024
Viaarxiv icon

LiveXiv -- A Multi-Modal Live Benchmark Based on Arxiv Papers Content

Add code
Oct 15, 2024
Viaarxiv icon

Efficient multi-prompt evaluation of LLMs

Add code
May 27, 2024
Viaarxiv icon

A statistical framework for weak-to-strong generalization

Add code
May 25, 2024
Viaarxiv icon

tinyBenchmarks: evaluating LLMs with fewer examples

Add code
Feb 22, 2024
Viaarxiv icon

Estimating Fréchet bounds for validating programmatic weak supervision

Add code
Dec 07, 2023
Figure 1 for Estimating Fréchet bounds for validating programmatic weak supervision
Figure 2 for Estimating Fréchet bounds for validating programmatic weak supervision
Figure 3 for Estimating Fréchet bounds for validating programmatic weak supervision
Figure 4 for Estimating Fréchet bounds for validating programmatic weak supervision
Viaarxiv icon

Fusing Models with Complementary Expertise

Add code
Oct 02, 2023
Viaarxiv icon

Conditional independence testing under model misspecification

Add code
Jul 05, 2023
Viaarxiv icon

A unified framework for dataset shift diagnostics

Add code
May 17, 2022
Figure 1 for A unified framework for dataset shift diagnostics
Figure 2 for A unified framework for dataset shift diagnostics
Figure 3 for A unified framework for dataset shift diagnostics
Figure 4 for A unified framework for dataset shift diagnostics
Viaarxiv icon