Picture for Martin Vechev

Martin Vechev

Automated Benchmark Generation for Repository-Level Coding Tasks

Add code
Mar 10, 2025
Viaarxiv icon

ToolFuzz -- Automated Agent Tool Testing

Add code
Mar 06, 2025
Viaarxiv icon

GRAIN: Exact Graph Reconstruction from Gradients

Add code
Mar 03, 2025
Viaarxiv icon

BaxBench: Can LLMs Generate Correct and Secure Backends?

Add code
Feb 20, 2025
Viaarxiv icon

BgGPT 1.0: Extending English-centric LLMs to other languages

Add code
Dec 14, 2024
Viaarxiv icon

A Unified Approach to Routing and Cascading for LLMs

Add code
Oct 14, 2024
Figure 1 for A Unified Approach to Routing and Cascading for LLMs
Figure 2 for A Unified Approach to Routing and Cascading for LLMs
Figure 3 for A Unified Approach to Routing and Cascading for LLMs
Figure 4 for A Unified Approach to Routing and Cascading for LLMs
Viaarxiv icon

COMPL-AI Framework: A Technical Interpretation and LLM Benchmarking Suite for the EU Artificial Intelligence Act

Add code
Oct 10, 2024
Figure 1 for COMPL-AI Framework: A Technical Interpretation and LLM Benchmarking Suite for the EU Artificial Intelligence Act
Figure 2 for COMPL-AI Framework: A Technical Interpretation and LLM Benchmarking Suite for the EU Artificial Intelligence Act
Figure 3 for COMPL-AI Framework: A Technical Interpretation and LLM Benchmarking Suite for the EU Artificial Intelligence Act
Figure 4 for COMPL-AI Framework: A Technical Interpretation and LLM Benchmarking Suite for the EU Artificial Intelligence Act
Viaarxiv icon

Multi-Neuron Unleashes Expressivity of ReLU Networks Under Convex Relaxation

Add code
Oct 09, 2024
Viaarxiv icon

Average Certified Radius is a Poor Metric for Randomized Smoothing

Add code
Oct 09, 2024
Viaarxiv icon

Ward: Provable RAG Dataset Inference via LLM Watermarks

Add code
Oct 04, 2024
Viaarxiv icon