Picture for Sean Welleck

Sean Welleck

AlphaVerus: Bootstrapping Formally Verified Code Generation through Self-Improving Translation and Treefinement

Add code
Dec 09, 2024
Viaarxiv icon

Evaluating Language Models as Synthetic Data Generators

Add code
Dec 04, 2024
Viaarxiv icon

ImProver: Agent-Based Automated Proof Optimization

Add code
Oct 07, 2024
Viaarxiv icon

miniCTX: Neural Theorem Proving with (Long-)Contexts

Add code
Aug 05, 2024
Viaarxiv icon

An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models

Add code
Aug 01, 2024
Figure 1 for An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models
Figure 2 for An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models
Figure 3 for An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models
Figure 4 for An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models
Viaarxiv icon

Lean-STaR: Learning to Interleave Thinking and Proving

Add code
Jul 14, 2024
Viaarxiv icon

From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models

Add code
Jun 24, 2024
Viaarxiv icon

miniCodeProps: a Minimal Benchmark for Proving Code Properties

Add code
Jun 16, 2024
Figure 1 for miniCodeProps: a Minimal Benchmark for Proving Code Properties
Figure 2 for miniCodeProps: a Minimal Benchmark for Proving Code Properties
Figure 3 for miniCodeProps: a Minimal Benchmark for Proving Code Properties
Figure 4 for miniCodeProps: a Minimal Benchmark for Proving Code Properties
Viaarxiv icon

The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models

Add code
Jun 09, 2024
Figure 1 for The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
Figure 2 for The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
Figure 3 for The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
Figure 4 for The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
Viaarxiv icon

Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models

Add code
May 02, 2024
Viaarxiv icon