Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Themisto: Jupyter-Based Runtime Benchmark

Apr 16, 2025

Konstantin Grotov, Sergey Titov

Share this with someone who'll enjoy it:

Abstract:In this work, we present a benchmark that consists of Jupyter notebooks development trajectories and allows measuring how large language models (LLMs) can leverage runtime information for predicting code output and code generation. We demonstrate that the current generation of LLMs performs poorly on these tasks and argue that there exists a significantly understudied domain in the development of code-based models, which involves incorporating the runtime context.

* Accepted to the third Deep Learning for Code (DL4C) workshop @ ICLR 2025

View paper on

Share this with someone who'll enjoy it:

Title:Themisto: Jupyter-Based Runtime Benchmark

Paper and Code