Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Abhinav Verma

The Pennsylvania State University

Joint Analysis of Optical and SAR Vegetation Indices for Vineyard Monitoring: Assessing Biomass Dynamics and Phenological Stages over Po Valley, Italy

Jun 16, 2025

Andrea Bergamaschi, Abhinav Verma, Avik Bhattacharya, Fabio Dell'Acqua

Abstract:Multi-polarized Synthetic Aperture Radar (SAR) technology has gained increasing attention in agriculture, offering unique capabilities for monitoring vegetation dynamics thanks to its all-weather, day-and-night operation and high revisit frequency. This study presents, for the first time, a comprehensive analysis combining dual-polarimetric radar vegetation index (DpRVI) with optical indices to characterize vineyard crops. Vineyards exhibit distinct non-isotropic scattering behavior due to their pronounced row orientation, making them particularly challenging and interesting targets for remote sensing. The research further investigates the relationship between DpRVI and optical vegetation indices, demonstrating the complementary nature of their information. We demonstrate that DpRVI and optical indices provide complementary information, with low correlation suggesting that they capture distinct vineyard features. Key findings reveal a parabolic trend in DpRVI over the growing season, potentially linked to biomass dynamics estimated via the Winkler Index. Unlike optical indices reflecting vegetation greenness, DpRVI appears more directly related to biomass growth, aligning with specific phenological phases. Preliminary results also highlight the potential of DpRVI for distinguishing vineyards from other crops. This research aligns with the objectives of the PNRR-NODES project, which promotes nature-based solutions (NbS) for sustainable vineyard management. The application of DpRVI for monitoring vineyards is part of integrating remote sensing techniques into the broader field of strategies for climate-related change adaptation and risk reduction, emphasizing the role of innovative SAR-based monitoring in sustainable agriculture.

Via

Access Paper or Ask Questions

Deep Policy Optimization with Temporal Logic Constraints

Apr 17, 2024

Ameesh Shah, Cameron Voloshin, Chenxi Yang, Abhinav Verma, Swarat Chaudhuri, Sanjit A. Seshia

Figure 1 for Deep Policy Optimization with Temporal Logic Constraints

Figure 2 for Deep Policy Optimization with Temporal Logic Constraints

Figure 3 for Deep Policy Optimization with Temporal Logic Constraints

Figure 4 for Deep Policy Optimization with Temporal Logic Constraints

Abstract:Temporal logics, such as linear temporal logic (LTL), offer a precise means of specifying tasks for (deep) reinforcement learning (RL) agents. In our work, we consider the setting where the task is specified by an LTL objective and there is an additional scalar reward that we need to optimize. Previous works focus either on learning a LTL task-satisfying policy alone or are restricted to finite state spaces. We make two contributions: First, we introduce an RL-friendly approach to this setting by formulating this problem as a single optimization objective. Our formulation guarantees that an optimal policy will be reward-maximal from the set of policies that maximize the likelihood of satisfying the LTL specification. Second, we address a sparsity issue that often arises for LTL-guided Deep RL policies by introducing Cycle Experience Replay (CyclER), a technique that automatically guides RL agents towards the satisfaction of an LTL specification. Our experiments demonstrate the efficacy of CyclER in finding performant deep RL policies in both continuous and discrete experimental domains.

* preprint, 8 pages

Via

Access Paper or Ask Questions

Compositional Policy Learning in Stochastic Control Systems with Formal Guarantees

Dec 03, 2023

Đorđe Žikelić, Mathias Lechner, Abhinav Verma, Krishnendu Chatterjee, Thomas A. Henzinger

Figure 1 for Compositional Policy Learning in Stochastic Control Systems with Formal Guarantees

Figure 2 for Compositional Policy Learning in Stochastic Control Systems with Formal Guarantees

Figure 3 for Compositional Policy Learning in Stochastic Control Systems with Formal Guarantees

Figure 4 for Compositional Policy Learning in Stochastic Control Systems with Formal Guarantees

Abstract:Reinforcement learning has shown promising results in learning neural network policies for complicated control tasks. However, the lack of formal guarantees about the behavior of such policies remains an impediment to their deployment. We propose a novel method for learning a composition of neural network policies in stochastic environments, along with a formal certificate which guarantees that a specification over the policy's behavior is satisfied with the desired probability. Unlike prior work on verifiable RL, our approach leverages the compositional nature of logical specifications provided in SpectRL, to learn over graphs of probabilistic reach-avoid specifications. The formal guarantees are provided by learning neural network policies together with reach-avoid supermartingales (RASM) for the graph's sub-tasks and then composing them into a global policy. We also derive a tighter lower bound compared to previous work on the probability of reach-avoidance implied by a RASM, which is required to find a compositional policy with an acceptable probabilistic threshold for complex tasks with multiple edge policies. We implement a prototype of our approach and evaluate it on a Stochastic Nine Rooms environment.

* Accepted at NeurIPS 2023

Via

Access Paper or Ask Questions

Eventual Discounting Temporal Logic Counterfactual Experience Replay

Mar 03, 2023

Cameron Voloshin, Abhinav Verma, Yisong Yue

Abstract:Linear temporal logic (LTL) offers a simplified way of specifying tasks for policy optimization that may otherwise be difficult to describe with scalar reward functions. However, the standard RL framework can be too myopic to find maximally LTL satisfying policies. This paper makes two contributions. First, we develop a new value-function based proxy, using a technique we call eventual discounting, under which one can find policies that satisfy the LTL specification with highest achievable probability. Second, we develop a new experience replay method for generating off-policy data from on-policy rollouts via counterfactual reasoning on different ways of satisfying the LTL specification. Our experiments, conducted in both discrete and continuous state-action spaces, confirm the effectiveness of our counterfactual experience replay approach.

Via

Access Paper or Ask Questions

Neurosymbolic Reinforcement Learning with Formally Verified Exploration

Oct 26, 2020

Greg Anderson, Abhinav Verma, Isil Dillig, Swarat Chaudhuri

Figure 1 for Neurosymbolic Reinforcement Learning with Formally Verified Exploration

Figure 2 for Neurosymbolic Reinforcement Learning with Formally Verified Exploration

Figure 3 for Neurosymbolic Reinforcement Learning with Formally Verified Exploration

Figure 4 for Neurosymbolic Reinforcement Learning with Formally Verified Exploration

Abstract:We present Revel, a partially neural reinforcement learning (RL) framework for provably safe exploration in continuous state and action spaces. A key challenge for provably safe deep RL is that repeatedly verifying neural networks within a learning loop is computationally infeasible. We address this challenge using two policy classes: a general, neurosymbolic class with approximate gradients and a more restricted class of symbolic policies that allows efficient verification. Our learning algorithm is a mirror descent over policies: in each iteration, it safely lifts a symbolic policy into the neurosymbolic space, performs safe gradient updates to the resulting policy, and projects the updated policy into the safe symbolic subset, all without requiring explicit verification of neural networks. Our empirical results show that Revel enforces safe exploration in many scenarios in which Constrained Policy Optimization does not, and that it can discover policies that outperform those learned through prior approaches to verified exploration.

Via

Access Paper or Ask Questions

Learning Differentiable Programs with Admissible Neural Heuristics

Jul 26, 2020

Ameesh Shah, Eric Zhan, Jennifer J. Sun, Abhinav Verma, Yisong Yue, Swarat Chaudhuri

Figure 1 for Learning Differentiable Programs with Admissible Neural Heuristics

Figure 2 for Learning Differentiable Programs with Admissible Neural Heuristics

Figure 3 for Learning Differentiable Programs with Admissible Neural Heuristics

Figure 4 for Learning Differentiable Programs with Admissible Neural Heuristics

Abstract:We study the problem of learning differentiable functions expressed as programs in a domain-specific language. Such programmatic models can offer benefits such as composability and interpretability; however, learning them requires optimizing over a combinatorial space of program "architectures". We frame this optimization problem as a search in a weighted graph whose paths encode top-down derivations of program syntax. Our key innovation is to view various classes of neural networks as continuous relaxations over the space of programs, which can then be used to complete any partial program. This relaxed program is differentiable and can be trained end-to-end, and the resulting training loss is an approximately admissible heuristic that can guide the combinatorial search. We instantiate our approach on top of the A-star algorithm and an iteratively deepened branch-and-bound search, and use these algorithms to learn programmatic classifiers in three sequence classification tasks. Our experiments show that the algorithms outperform state-of-the-art methods for program learning, and that they discover programmatic classifiers that yield natural interpretations and achieve competitive accuracy.

* 9 pages, under review

Via

Access Paper or Ask Questions

Imitation-Projected Policy Gradient for Programmatic Reinforcement Learning

Jul 11, 2019

Abhinav Verma, Hoang M. Le, Yisong Yue, Swarat Chaudhuri

Figure 1 for Imitation-Projected Policy Gradient for Programmatic Reinforcement Learning

Figure 2 for Imitation-Projected Policy Gradient for Programmatic Reinforcement Learning

Figure 3 for Imitation-Projected Policy Gradient for Programmatic Reinforcement Learning

Figure 4 for Imitation-Projected Policy Gradient for Programmatic Reinforcement Learning

Abstract:We present Imitation-Projected Policy Gradient (IPPG), an algorithmic framework for learning policies that are parsimoniously represented in a structured programming language. Such programmatic policies can be more interpretable, generalizable, and amenable to formal verification than neural policies; however, designing rigorous learning approaches for programmatic policies remains a challenge. IPPG, our response to this challenge, is based on three insights. First, we view our learning task as optimization in policy space, modulo the constraint that the desired policy has a programmatic representation, and solve this optimization problem using a "lift-and-project" perspective that takes a gradient step into the unconstrained policy space and then projects back onto the constrained space. Second, we view the unconstrained policy space as mixing neural and programmatic representations, which enables employing state-of-the-art deep policy gradient approaches. Third, we cast the projection step as program synthesis via imitation learning, and exploit contemporary combinatorial methods for this task. We present theoretical convergence results for IPPG, as well as an empirical evaluation in three continuous control domains. The experiments show that IPPG can significantly outperform state-of-the-art approaches for learning programmatic policies.

Via

Access Paper or Ask Questions

Control Regularization for Reduced Variance Reinforcement Learning

May 14, 2019

Richard Cheng, Abhinav Verma, Gabor Orosz, Swarat Chaudhuri, Yisong Yue, Joel W. Burdick

Figure 1 for Control Regularization for Reduced Variance Reinforcement Learning

Figure 2 for Control Regularization for Reduced Variance Reinforcement Learning

Figure 3 for Control Regularization for Reduced Variance Reinforcement Learning

Figure 4 for Control Regularization for Reduced Variance Reinforcement Learning

Abstract:Dealing with high variance is a significant challenge in model-free reinforcement learning (RL). Existing methods are unreliable, exhibiting high variance in performance from run to run using different initializations/seeds. Focusing on problems arising in continuous control, we propose a functional regularization approach to augmenting model-free RL. In particular, we regularize the behavior of the deep policy to be similar to a policy prior, i.e., we regularize in function space. We show that functional regularization yields a bias-variance trade-off, and propose an adaptive tuning strategy to optimize this trade-off. When the policy prior has control-theoretic stability guarantees, we further show that this regularization approximately preserves those stability guarantees throughout learning. We validate our approach empirically on a range of settings, and demonstrate significantly reduced variance, guaranteed dynamic stability, and more efficient learning than deep RL alone.

* Appearing in ICML 2019

Via

Access Paper or Ask Questions

Representing Formal Languages: A Comparison Between Finite Automata and Recurrent Neural Networks

Feb 27, 2019

Joshua J. Michalenko, Ameesh Shah, Abhinav Verma, Richard G. Baraniuk, Swarat Chaudhuri, Ankit B. Patel

Figure 1 for Representing Formal Languages: A Comparison Between Finite Automata and Recurrent Neural Networks

Figure 2 for Representing Formal Languages: A Comparison Between Finite Automata and Recurrent Neural Networks

Figure 3 for Representing Formal Languages: A Comparison Between Finite Automata and Recurrent Neural Networks

Figure 4 for Representing Formal Languages: A Comparison Between Finite Automata and Recurrent Neural Networks

Abstract:We investigate the internal representations that a recurrent neural network (RNN) uses while learning to recognize a regular formal language. Specifically, we train a RNN on positive and negative examples from a regular language, and ask if there is a simple decoding function that maps states of this RNN to states of the minimal deterministic finite automaton (MDFA) for the language. Our experiments show that such a decoding function indeed exists, and that it maps states of the RNN not to MDFA states, but to states of an {\em abstraction} obtained by clustering small sets of MDFA states into "superstates". A qualitative analysis reveals that the abstraction often has a simple interpretation. Overall, the results suggest a strong structural relationship between internal representations used by RNNs and finite automata, and explain the well-known ability of RNNs to recognize formal grammatical structure.

* 15 Pages, 13 Figures, Accepted to ICLR 2019

Via

Access Paper or Ask Questions

Programmatically Interpretable Reinforcement Learning

Jun 08, 2018

Abhinav Verma, Vijayaraghavan Murali, Rishabh Singh, Pushmeet Kohli, Swarat Chaudhuri

Figure 1 for Programmatically Interpretable Reinforcement Learning

Figure 2 for Programmatically Interpretable Reinforcement Learning

Figure 3 for Programmatically Interpretable Reinforcement Learning

Figure 4 for Programmatically Interpretable Reinforcement Learning

Abstract:We present a reinforcement learning framework, called Programmatically Interpretable Reinforcement Learning (PIRL), that is designed to generate interpretable and verifiable agent policies. Unlike the popular Deep Reinforcement Learning (DRL) paradigm, which represents policies by neural networks, PIRL represents policies using a high-level, domain-specific programming language. Such programmatic policies have the benefits of being more easily interpreted than neural networks, and being amenable to verification by symbolic methods. We propose a new method, called Neurally Directed Program Search (NDPS), for solving the challenging nonsmooth optimization problem of finding a programmatic policy with maximal reward. NDPS works by first learning a neural policy network using DRL, and then performing a local search over programmatic policies that seeks to minimize a distance from this neural "oracle". We evaluate NDPS on the task of learning to drive a simulated car in the TORCS car-racing environment. We demonstrate that NDPS is able to discover human-readable policies that pass some significant performance bars. We also show that PIRL policies can have smoother trajectories, and can be more easily transferred to environments not encountered during training, than corresponding policies discovered by DRL.

* Accepted by The 35th International Conference on Machine Learning (ICML 2018)

Via

Access Paper or Ask Questions