Abstract:In cognitive systems, recent emphasis has been placed on studying cognitive processes of the subject whose behavior was the primary focus of the system's cognitive response. This approach, known as inverse cognition, arises in counter-adversarial applications and has motivated the development of inverse Bayesian filters. In this context, a cognitive adversary, such as a radar, uses a forward Bayesian filter to track its target of interest. An inverse filter is then employed to infer adversary's estimate of target's or defender's state. Previous studies have addressed this inverse filtering problem by introducing methods like inverse Kalman filter (I-KF), inverse extended KF (I-EKF), and inverse unscented KF (I-UKF). However, these inverse filters assume additive Gaussian noises and/or rely on local approximations of non-linear dynamics at the state estimates, limiting their practical application. Contrarily, this paper adopts a global filtering approach and develops an inverse particle filter (I-PF). The particle filter framework employs Monte Carlo (MC) methods to approximate arbitrary posterior distributions. Moreover, under mild system-level conditions, the proposed I-PF demonstrates convergence to the optimal inverse filter. Additionally, we explore MC techniques to approximate Gaussian posteriors and introduce inverse Gaussian PF (I-GPF) and inverse ensemble KF (I-EnKF). Our I-GPF and I-EnKF can efficiently handle non-Gaussian noises with suitable modifications. Additionally, we propose the differentiable I-PF, differentiable I-EnKF, and reproducing kernel Hilbert space-based EnKF (RKHS-EnKF) methods to address scenarios where system information is unknown to defender. Using recursive Cram\'er-Rao lower bound and non-credibility index (NCI), our numerical experiments for different applications demonstrate the estimation performance and time complexity of the proposed filters.
Abstract:This paper addresses the problem of detecting false data injection (FDI) attacks in a distributed network without a fusion center, represented by a connected graph among multiple agent nodes. Each agent node is equipped with a sensor, and uses a Kalman consensus information filter (KCIF) to track a discrete time global process with linear dynamics and additive Gaussian noise. The state estimate of the global process at any sensor is computed from the local observation history and the information received by that agent node from its neighbors. At an unknown time, an attacker starts altering the local observation of one agent node. In the Bayesian setting where there is a known prior distribution of the attack beginning instant, we formulate a Bayesian quickest change detection (QCD) problem for FDI detection in order to minimize the mean detection delay subject to a false alarm probability constraint. While it is well-known that the optimal Bayesian QCD rule involves checking the Shriyaev's statistic against a threshold, we demonstrate how to compute the Shriyaev's statistic at each node in a recursive fashion given our non-i.i.d. observations. Next, we consider non-Bayesian QCD where the attack begins at an arbitrary and unknown time, and the detector seeks to minimize the worst case detection delay subject to a constraint on the mean time to false alarm and probability of misidentification. We use the multiple hypothesis sequential probability ratio test for attack detection and identification at each sensor. For unknown attack strategy, we use the window-limited generalized likelihood ratio (WL-GLR) algorithm to solve the QCD problem. Numerical results demonstrate the performances and trade-offs of the proposed algorithms.
Abstract:In this work, we propose a novel inverse reinforcement learning (IRL) algorithm for constrained Markov decision process (CMDP) problems. In standard IRL problems, the inverse learner or agent seeks to recover the reward function of the MDP, given a set of trajectory demonstrations for the optimal policy. In this work, we seek to infer not only the reward functions of the CMDP, but also the constraints. Using the principle of maximum entropy, we show that the IRL with constraint recovery (IRL-CR) problem can be cast as a constrained non-convex optimization problem. We reduce it to an alternating constrained optimization problem whose sub-problems are convex. We use exponentiated gradient descent algorithm to solve it. Finally, we demonstrate the efficacy of our algorithm for the grid world environment.
Abstract:Rapid advances in designing cognitive and counter-adversarial systems have motivated the development of inverse Bayesian filters. In this setting, a cognitive `adversary' tracks its target of interest via a stochastic framework such as a Kalman filter (KF). The target or `defender' then employs another inverse stochastic filter to infer the forward filter estimates of the defender computed by the adversary. For linear systems, inverse Kalman filter (I-KF) has been recently shown to be effective in these counter-adversarial applications. In the paper, contrary to prior works, we focus on non-linear system dynamics and formulate the inverse unscented KF (I-UKF) to estimate the defender's state with reduced linearization errors. We then generalize this framework to an unknown system model by proposing reproducing kernel Hilbert space-based UKF (RKHS-UKF) to learn the system dynamics and estimate the state based on its observations. Our theoretical analyses to guarantee the stochastic stability of I-UKF and RKHS-UKF in the mean-squared sense shows that, provided the forward filters are stable, the inverse filters are also stable under mild system-level conditions. Our numerical experiments for several different applications demonstrate the state estimation performance of the proposed filters using recursive Cram\'{e}r-Rao lower bound as a benchmark.
Abstract:Recent developments in counter-adversarial system research have led to the development of inverse stochastic filters that are employed by a defender to infer the information its adversary may have learned. Prior works addressed this inverse cognition problem by proposing inverse Kalman filter (I-KF) and inverse extended KF (I-EKF), respectively, for linear and non-linear Gaussian state-space models. However, in practice, many counter-adversarial settings involve highly non-linear system models, wherein EKF's linearization often fails. In this paper, we consider the efficient numerical integration techniques to address such nonlinearities and, to this end, develop inverse cubature KF (I-CKF) and inverse quadrature KF (I-QKF). We derive the stochastic stability conditions for the proposed filters in the exponential-mean-squared-boundedness sense. Numerical experiments demonstrate the estimation accuracy of our I-CKF and I-QKF with the recursive Cram\'{e}r-Rao lower bound as a benchmark.
Abstract:We study learning in periodic Markov Decision Process (MDP), a special type of non-stationary MDP where both the state transition probabilities and reward functions vary periodically, under the average reward maximization setting. We formulate the problem as a stationary MDP by augmenting the state space with the period index, and propose a periodic upper confidence bound reinforcement learning-2 (PUCRL2) algorithm. We show that the regret of PUCRL2 varies linearly with the period $N$ and as $\mathcal{O}(\sqrt{Tlog T})$ with the horizon length $T$. Utilizing the information about the sparsity of transition matrix of augmented MDP, we propose another algorithm PUCRLB which enhances upon PUCRL2, both in terms of regret ($O(\sqrt{N})$ dependency on period) and empirical performance. Finally, we propose two other algorithms U-PUCRL2 and U-PUCRLB for extended uncertainty in the environment in which the period is unknown but a set of candidate periods are known. Numerical results demonstrate the efficacy of all the algorithms.
Abstract:Multiple-input multiple-output (MIMO) radar has several advantages with respect to the traditional radar array systems in terms of performance and flexibility. However, in order to achieve high angular resolution, a MIMO radar requires a large number of transmit and receive antennas, which increases hardware design and computational complexities. Although spatial compressive sensing (CS) has been recently considered for a pulsed-waveform MIMO radar with sparse random arrays, such methods for the frequency-modulated continuous wave (FMCW) radar remain largely unexplored. In this context, we propose a novel multi-target localization algorithm in the range-angle domain for a MIMO FMCW radar with a sparse array of randomly placed transmit and receive elements. In particular, we first obtain the targets' range-delays using a discrete Fourier transform (DFT)-based focusing operation. The target angles are then recovered at each detected range using CS-based techniques exploiting the sparsity of the target scene. Our simulation results demonstrate the effectiveness of the proposed algorithm over the classical methods in detecting multiple targets with a sparse array.
Abstract:In order to infer the strategy of an intelligent attacker, it is desired for the defender to cognitively sense the attacker's state. In this context, we aim to learn the information that an adversary has gathered about us from a Bayesian perspective. Prior works employ linear Gaussian state-space models and solve this inverse cognition problem through the design of inverse stochastic filters. In practice, these counter-adversarial settings are highly nonlinear systems. We address this by formulating the inverse cognition as a nonlinear Gaussian state-space model, wherein the adversary employs an unscented Kalman filter (UKF) to estimate our state with reduced linearization errors. To estimate the adversary's estimate of us, we propose and develop an inverse UKF (IUKF), wherein the system model is known to both the adversary and the defender. We also derive the conditions for the stochastic stability of IUKF in the mean-squared boundedness sense. Numerical experiments for multiple practical system models show that the estimation error of IUKF converges and closely follows the recursive Cram\'{e}r-Rao lower bound.
Abstract:Recent counter-adversarial system design problems have motivated the development of inverse Bayesian filters. For example, inverse Kalman filter (I-KF) has been recently formulated to estimate the adversary's Kalman filter tracked estimates and hence, predict the adversary's future steps. The purpose of this paper and the companion paper (Part I) is to address the inverse filtering problem in non-linear systems by proposing an inverse extended Kalman filter (I-EKF). In a companion paper (Part I), we developed the theory of I-EKF (with and without unknown inputs) and I-KF (with unknown inputs). In this paper, we develop this theory for highly non-linear models, which employ second-order, Gaussian sum, and dithered forward EKFs. In particular, we derive theoretical stability guarantees for the inverse second-order EKF using the bounded non-linearity approach. To address the limitation of the standard I-EKFs that the system model and forward filter are perfectly known to the defender, we propose reproducing kernel Hilbert space-based EKF to learn the unknown system dynamics based on its observations, which can be employed as an inverse filter to infer the adversary's estimate. Numerical experiments demonstrate the state estimation performance of the proposed filters using recursive Cram\'{e}r-Rao lower bound as a benchmark.
Abstract:We study learning in periodic Markov Decision Process(MDP), a special type of non-stationary MDP where both the state transition probabilities and reward functions vary periodically, under the average reward maximization setting. We formulate the problem as a stationary MDP by augmenting the state space with the period index, and propose a periodic upper confidence bound reinforcement learning-2 (PUCRL2) algorithm. We show that the regret of PUCRL2 varies linearly with the period and as sub-linear with the horizon length. Numerical results demonstrate the efficacy of PUCRL2.