Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Rajeev Alur

University of Pennsylvania

Stable Prediction of Adverse Events in Medical Time-Series Data

Oct 16, 2025

Mayank Keoliya, Seewon Choi, Rajeev Alur, Mayur Naik, Eric Wong

Abstract:Early event prediction (EEP) systems continuously estimate a patient's imminent risk to support clinical decision-making. For bedside trust, risk trajectories must be accurate and temporally stable, shifting only with new, relevant evidence. However, current benchmarks (a) ignore stability of risk scores and (b) evaluate mainly on tabular inputs, leaving trajectory behavior untested. To address this gap, we introduce CAREBench, an EEP benchmark that evaluates deployability using multi-modal inputs-tabular EHR, ECG waveforms, and clinical text-and assesses temporal stability alongside predictive accuracy. We propose a stability metric that quantifies short-term variability in per-patient risk and penalizes abrupt oscillations based on local-Lipschitz constants. CAREBench spans six prediction tasks such as sepsis onset and compares classical learners, deep sequence models, and zero-shot LLMs. Across tasks, existing methods, especially LLMs, struggle to jointly optimize accuracy and stability, with notably poor recall at high-precision operating points. These results highlight the need for models that produce evidence-aligned, stable trajectories to earn clinician trust in continuous monitoring settings. (Code: https://github.com/SeewonChoi/CAREBench.)

* 18 pages, 3 Figures

Via

Access Paper or Ask Questions

Composing Agents to Minimize Worst-case Risk

Jun 05, 2025

Guruprerana Shabadi, Rajeev Alur

Abstract:From software development to robot control, modern agentic systems decompose complex objectives into a sequence of subtasks and choose a set of specialized AI agents to complete them. We formalize an agentic workflow as a directed acyclic graph, called an agent graph, where edges represent AI agents and paths correspond to feasible compositions of agents. When deploying these systems in the real world, we need to choose compositions of agents that not only maximize the task success, but also minimize risk where the risk captures requirements like safety, fairness, and privacy. This additionally requires carefully analyzing the low-probability (tail) behaviors of compositions of agents. In this work, we consider worst-case risk minimization over the set of feasible agent compositions. We define worst-case risk as the tail quantile -- also known as value-at-risk -- of the loss distribution of the agent composition where the loss quantifies the risk associated with agent behaviors. We introduce an efficient algorithm that traverses the agent graph and finds a near-optimal composition of agents by approximating the value-at-risk via a union bound and dynamic programming. Furthermore, we prove that the approximation is near-optimal asymptotically for a broad class of practical loss functions. To evaluate our framework, we consider a suite of video game-like control benchmarks that require composing several agents trained with reinforcement learning and demonstrate our algorithm's effectiveness in approximating the value-at-risk and identifying the optimal agent composition.

* 17 pages, 4 figures

Via

Access Paper or Ask Questions

Scenario-based Compositional Verification of Autonomous Systems with Neural Perception

Apr 29, 2025

Christopher Watson, Rajeev Alur, Divya Gopinath, Ravi Mangal, Corina S. Pasareanu

Abstract:Recent advances in deep learning have enabled the development of autonomous systems that use deep neural networks for perception. Formal verification of these systems is challenging due to the size and complexity of the perception DNNs as well as hard-to-quantify, changing environment conditions. To address these challenges, we propose a probabilistic verification framework for autonomous systems based on the following key concepts: (1) Scenario-based Modeling: We decompose the task (e.g., car navigation) into a composition of scenarios, each representing a different environment condition. (2) Probabilistic Abstractions: For each scenario, we build a compact abstraction of perception based on the DNN's performance on an offline dataset that represents the scenario's environment condition. (3) Symbolic Reasoning and Acceleration: The abstractions enable efficient compositional verification of the autonomous system via symbolic reasoning and a novel acceleration proof rule that bounds the error probability of the system under arbitrary variations of environment conditions. We illustrate our approach on two case studies: an experimental autonomous system that guides airplanes on taxiways using high-dimensional perception DNNs and a simulation model of an F1Tenth autonomous car using LiDAR observations.

Via

Access Paper or Ask Questions

CTSketch: Compositional Tensor Sketching for Scalable Neurosymbolic Learning

Mar 31, 2025

Seewon Choi, Alaia Solko-Breslin, Rajeev Alur, Eric Wong

Abstract:Many computational tasks benefit from being formulated as the composition of neural networks followed by a discrete symbolic program. The goal of neurosymbolic learning is to train the neural networks using only end-to-end input-output labels of the composite. We introduce CTSketch, a novel, scalable neurosymbolic learning algorithm. CTSketch uses two techniques to improve the scalability of neurosymbolic inference: decompose the symbolic program into sub-programs and summarize each sub-program with a sketched tensor. This strategy allows us to approximate the output distribution of the program with simple tensor operations over the input distributions and summaries. We provide theoretical insight into the maximum error of the approximation. Furthermore, we evaluate CTSketch on many benchmarks from the neurosymbolic literature, including some designed for evaluating scalability. Our results show that CTSketch pushes neurosymbolic learning to new scales that have previously been unattainable by obtaining high accuracy on tasks involving over one thousand inputs.

* 15 pages, 6 figures

Via

Access Paper or Ask Questions

Relational Programming with Foundation Models

Dec 19, 2024

Ziyang Li, Jiani Huang, Jason Liu, Felix Zhu, Eric Zhao, William Dodds, Neelay Velingker, Rajeev Alur, Mayur Naik

Abstract:Foundation models have vast potential to enable diverse AI applications. The powerful yet incomplete nature of these models has spurred a wide range of mechanisms to augment them with capabilities such as in-context learning, information retrieval, and code interpreting. We propose Vieira, a declarative framework that unifies these mechanisms in a general solution for programming with foundation models. Vieira follows a probabilistic relational paradigm and treats foundation models as stateless functions with relational inputs and outputs. It supports neuro-symbolic applications by enabling the seamless combination of such models with logic programs, as well as complex, multi-modal applications by streamlining the composition of diverse sub-models. We implement Vieira by extending the Scallop compiler with a foreign interface that supports foundation models as plugins. We implement plugins for 12 foundation models including GPT, CLIP, and SAM. We evaluate Vieira on 9 challenging tasks that span language, vision, and structured and vector databases. Our evaluation shows that programs in Vieira are concise, can incorporate modern foundation models, and have comparable or better accuracy than competitive baselines.

Via

Access Paper or Ask Questions

Logicbreaks: A Framework for Understanding Subversion of Rule-based Inference

Jun 21, 2024

Anton Xue, Avishree Khare, Rajeev Alur, Surbhi Goel, Eric Wong

Abstract:We study how to subvert language models from following the rules. We model rule-following as inference in propositional Horn logic, a mathematical system in which rules have the form "if $P$ and $Q$, then $R$" for some propositions $P$, $Q$, and $R$. We prove that although transformers can faithfully abide by such rules, maliciously crafted prompts can nevertheless mislead even theoretically constructed models. Empirically, we find that attacks on our theoretical models mirror popular attacks on large language models. Our work suggests that studying smaller theoretical models can help understand the behavior of large language models in rule-based settings like logical reasoning and jailbreak attacks.

Via

Access Paper or Ask Questions

Data-Efficient Learning with Neural Programs

Jun 10, 2024

Alaia Solko-Breslin, Seewon Choi, Ziyang Li, Neelay Velingker, Rajeev Alur, Mayur Naik, Eric Wong

Figure 1 for Data-Efficient Learning with Neural Programs

Figure 2 for Data-Efficient Learning with Neural Programs

Figure 3 for Data-Efficient Learning with Neural Programs

Figure 4 for Data-Efficient Learning with Neural Programs

Abstract:Many computational tasks can be naturally expressed as a composition of a DNN followed by a program written in a traditional programming language or an API call to an LLM. We call such composites "neural programs" and focus on the problem of learning the DNN parameters when the training data consist of end-to-end input-output labels for the composite. When the program is written in a differentiable logic programming language, techniques from neurosymbolic learning are applicable, but in general, the learning for neural programs requires estimating the gradients of black-box components. We present an algorithm for learning neural programs, called ISED, that only relies on input-output samples of black-box components. For evaluation, we introduce new benchmarks that involve calls to modern LLMs such as GPT-4 and also consider benchmarks from the neurosymolic learning literature. Our evaluation shows that for the latter benchmarks, ISED has comparable performance to state-of-the-art neurosymbolic frameworks. For the former, we use adaptations of prior work on gradient approximations of black-box components as a baseline, and show that ISED achieves comparable accuracy but in a more data- and sample-efficient manner.

Via

Access Paper or Ask Questions

Stability Guarantees for Feature Attributions with Multiplicative Smoothing

Jul 12, 2023

Anton Xue, Rajeev Alur, Eric Wong

Figure 1 for Stability Guarantees for Feature Attributions with Multiplicative Smoothing

Figure 2 for Stability Guarantees for Feature Attributions with Multiplicative Smoothing

Figure 3 for Stability Guarantees for Feature Attributions with Multiplicative Smoothing

Figure 4 for Stability Guarantees for Feature Attributions with Multiplicative Smoothing

Abstract:Explanation methods for machine learning models tend to not provide any formal guarantees and may not reflect the underlying decision-making process. In this work, we analyze stability as a property for reliable feature attribution methods. We prove that relaxed variants of stability are guaranteed if the model is sufficiently Lipschitz with respect to the masking of features. To achieve such a model, we develop a smoothing method called Multiplicative Smoothing (MuS). We show that MuS overcomes theoretical limitations of standard smoothing techniques and can be integrated with any classifier and feature attribution method. We evaluate MuS on vision and language models with a variety of feature attribution methods, such as LIME and SHAP, and demonstrate that MuS endows feature attributions with non-trivial stability guarantees.

Via

Access Paper or Ask Questions

Policy Synthesis and Reinforcement Learning for Discounted LTL

May 29, 2023

Rajeev Alur, Osbert Bastani, Kishor Jothimurugan, Mateo Perez, Fabio Somenzi, Ashutosh Trivedi

Abstract:The difficulty of manually specifying reward functions has led to an interest in using linear temporal logic (LTL) to express objectives for reinforcement learning (RL). However, LTL has the downside that it is sensitive to small perturbations in the transition probabilities, which prevents probably approximately correct (PAC) learning without additional assumptions. Time discounting provides a way of removing this sensitivity, while retaining the high expressivity of the logic. We study the use of discounted LTL for policy synthesis in Markov decision processes with unknown transition probabilities, and show how to reduce discounted LTL to discounted-sum reward via a reward machine when all discount factors are identical.

Via

Access Paper or Ask Questions

Robust Subtask Learning for Compositional Generalization

Feb 06, 2023

Kishor Jothimurugan, Steve Hsu, Osbert Bastani, Rajeev Alur

Abstract:Compositional reinforcement learning is a promising approach for training policies to perform complex long-horizon tasks. Typically, a high-level task is decomposed into a sequence of subtasks and a separate policy is trained to perform each subtask. In this paper, we focus on the problem of training subtask policies in a way that they can be used to perform any task; here, a task is given by a sequence of subtasks. We aim to maximize the worst-case performance over all tasks as opposed to the average-case performance. We formulate the problem as a two agent zero-sum game in which the adversary picks the sequence of subtasks. We propose two RL algorithms to solve this game: one is an adaptation of existing multi-agent RL algorithms to our setting and the other is an asynchronous version which enables parallel training of subtask policies. We evaluate our approach on two multi-task environments with continuous states and actions and demonstrate that our algorithms outperform state-of-the-art baselines.

Via

Access Paper or Ask Questions