Picture for Martin Vechev

Martin Vechev

Type-Constrained Code Generation with Language Models

Add code
Apr 12, 2025
Viaarxiv icon

Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad

Add code
Mar 27, 2025
Viaarxiv icon

Automated Benchmark Generation for Repository-Level Coding Tasks

Add code
Mar 10, 2025
Viaarxiv icon

ToolFuzz -- Automated Agent Tool Testing

Add code
Mar 06, 2025
Viaarxiv icon

GRAIN: Exact Graph Reconstruction from Gradients

Add code
Mar 03, 2025
Viaarxiv icon

BaxBench: Can LLMs Generate Correct and Secure Backends?

Add code
Feb 20, 2025
Viaarxiv icon

BgGPT 1.0: Extending English-centric LLMs to other languages

Add code
Dec 14, 2024
Viaarxiv icon

A Unified Approach to Routing and Cascading for LLMs

Add code
Oct 14, 2024
Figure 1 for A Unified Approach to Routing and Cascading for LLMs
Figure 2 for A Unified Approach to Routing and Cascading for LLMs
Figure 3 for A Unified Approach to Routing and Cascading for LLMs
Figure 4 for A Unified Approach to Routing and Cascading for LLMs
Viaarxiv icon

COMPL-AI Framework: A Technical Interpretation and LLM Benchmarking Suite for the EU Artificial Intelligence Act

Add code
Oct 10, 2024
Figure 1 for COMPL-AI Framework: A Technical Interpretation and LLM Benchmarking Suite for the EU Artificial Intelligence Act
Figure 2 for COMPL-AI Framework: A Technical Interpretation and LLM Benchmarking Suite for the EU Artificial Intelligence Act
Figure 3 for COMPL-AI Framework: A Technical Interpretation and LLM Benchmarking Suite for the EU Artificial Intelligence Act
Figure 4 for COMPL-AI Framework: A Technical Interpretation and LLM Benchmarking Suite for the EU Artificial Intelligence Act
Viaarxiv icon

Average Certified Radius is a Poor Metric for Randomized Smoothing

Add code
Oct 09, 2024
Viaarxiv icon