Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jacques Fleuriot

APE-Bench I: Towards File-level Automated Proof Engineering of Formal Math Libraries

Apr 27, 2025

Huajian Xin, Luming Li, Xiaoran Jin, Jacques Fleuriot, Wenda Li

Abstract:Recent progress in large language models (LLMs) has shown promise in formal theorem proving, yet existing benchmarks remain limited to isolated, static proof tasks, failing to capture the iterative, engineering-intensive workflows of real-world formal mathematics libraries. Motivated by analogous advances in software engineering, we introduce the paradigm of Automated Proof Engineering (APE), which aims to automate proof engineering tasks such as feature addition, proof refactoring, and bug fixing using LLMs. To facilitate research in this direction, we present APE-Bench I, the first realistic benchmark built from real-world commit histories of Mathlib4, featuring diverse file-level tasks described in natural language and verified via a hybrid approach combining the Lean compiler and LLM-as-a-Judge. We further develop Eleanstic, a scalable parallel verification infrastructure optimized for proof checking across multiple versions of Mathlib. Empirical results on state-of-the-art LLMs demonstrate strong performance on localized edits but substantial degradation on handling complex proof engineering. This work lays the foundation for developing agentic workflows in proof engineering, with future benchmarks targeting multi-file coordination, project-scale verification, and autonomous agents capable of planning, editing, and repairing formal libraries.

Via

Access Paper or Ask Questions

Formalising the Foundations of Discrete Reinforcement Learning in Isabelle/HOL

Dec 11, 2021

Mark Chevallier, Jacques Fleuriot

Figure 1 for Formalising the Foundations of Discrete Reinforcement Learning in Isabelle/HOL

Abstract:We present a formalisation of finite Markov decision processes with rewards in the Isabelle theorem prover. We focus on the foundations required for dynamic programming and the use of reinforcement learning agents over such processes. In particular, we derive the Bellman equation from first principles (in both scalar and vector form), derive a vector calculation that produces the expected value of any policy p, and go on to prove the existence of a universally optimal policy where there is a discounting factor less than one. Lastly, we prove that the value iteration and the policy iteration algorithms work in finite time, producing an epsilon-optimal and a fully optimal policy respectively.

Via

Access Paper or Ask Questions

Dr.Aid: Supporting Data-governance Rule Compliance for Decentralized Collaboration in an Automated Way

Oct 03, 2021

Rui Zhao, Malcolm Atkinson, Petros Papapanagiotou, Federica Magnoni, Jacques Fleuriot

Figure 1 for Dr.Aid: Supporting Data-governance Rule Compliance for Decentralized Collaboration in an Automated Way

Figure 2 for Dr.Aid: Supporting Data-governance Rule Compliance for Decentralized Collaboration in an Automated Way

Figure 3 for Dr.Aid: Supporting Data-governance Rule Compliance for Decentralized Collaboration in an Automated Way

Figure 4 for Dr.Aid: Supporting Data-governance Rule Compliance for Decentralized Collaboration in an Automated Way

Abstract:Collaboration across institutional boundaries is widespread and increasing today. It depends on federations sharing data that often have governance rules or external regulations restricting their use. However, the handling of data governance rules (aka. data-use policies) remains manual, time-consuming and error-prone, limiting the rate at which collaborations can form and respond to challenges and opportunities, inhibiting citizen science and reducing data providers' trust in compliance. Using an automated system to facilitate compliance handling reduces substantially the time needed for such non-mission work, thereby accelerating collaboration and improving productivity. We present a framework, Dr.Aid, that helps individuals, organisations and federations comply with data rules, using automation to track which rules are applicable as data is passed between processes and as derived data is generated. It encodes data-governance rules using a formal language and performs reasoning on multi-input-multi-output data-flow graphs in decentralised contexts. We test its power and utility by working with users performing cyclone tracking and earthquake modelling to support mitigation and emergency response. We query standard provenance traces to detach Dr.Aid from details of the tools and systems they are using, as these inevitably vary across members of a federation and through time. We evaluate the model in three aspects by encoding real-life data-use policies from diverse fields, showing its capability for real-world usage and its advantages compared with traditional frameworks. We argue that this approach will lead to more agile, more productive and more trustworthy collaborations and show that the approach can be adopted incrementally. This, in-turn, will allow more appropriate data policies to emerge opening up new forms of collaboration.

* Accepted for The 24th ACM Conference on Computer-Supported Cooperative Work and Social Computing (CSCW)

Via

Access Paper or Ask Questions

Social Network Processes in the Isabelle and Coq Theorem Proving Communities

Sep 22, 2016

Jacques Fleuriot, Steven Obua, Phil Scott

Figure 1 for Social Network Processes in the Isabelle and Coq Theorem Proving Communities

Figure 2 for Social Network Processes in the Isabelle and Coq Theorem Proving Communities

Figure 3 for Social Network Processes in the Isabelle and Coq Theorem Proving Communities

Figure 4 for Social Network Processes in the Isabelle and Coq Theorem Proving Communities

Abstract:We identify the main actors in the Isabelle and Coq communities and describe how they affect and influence their peers. This work explores selected foundations of social networking analysis that we expect to be useful in the context of the ProofPeer project, which is developing a new model for interactive theorem proving based on collaboration and social interactions.

* 15 pages, 13 figures, Research supported by EPSRC grant EP/L011794/1

Via

Access Paper or Ask Questions