Abstract:Semantic parsing (SP) is a core component of modern virtual assistants like Google Assistant and Amazon Alexa. While sequence-to-sequence-based auto-regressive (AR) approaches are common for conversational semantic parsing, recent studies employ non-autoregressive (NAR) decoders and reduce inference latency while maintaining competitive parsing quality. However, a major drawback of NAR decoders is the difficulty of generating top-k (i.e., k-best) outputs with approaches such as beam search. To address this challenge, we propose a novel NAR semantic parser that introduces intent conditioning on the decoder. Inspired by the traditional intent and slot tagging parsers, we decouple the top-level intent prediction from the rest of a parse. As the top-level intent largely governs the syntax and semantics of a parse, the intent conditioning allows the model to better control beam search and improves the quality and diversity of top-k outputs. We introduce a hybrid teacher-forcing approach to avoid training and inference mismatch. We evaluate the proposed NAR on conversational SP datasets, TOP & TOPv2. Like the existing NAR models, we maintain the O(1) decoding time complexity while generating more diverse outputs and improving the top-3 exact match (EM) by 2.4 points. In comparison with AR models, our model speeds up beam search inference by 6.7 times on CPU with competitive top-k EM.
Abstract:The task of predicting stochastic behaviors of road agents in diverse environments is a challenging problem for autonomous driving. To best understand scene contexts and produce diverse possible future states of the road agents adaptively in different environments, a prediction model should be probabilistic, multi-modal, context-driven, and general. We present Conditionalizing Variational AutoEncoders via Hypernetworks (CVAE-H); a conditional VAE that extensively leverages hypernetwork and performs generative tasks for high-dimensional problems like the prediction task. We first evaluate CVAE-H on simple generative experiments to show that CVAE-H is probabilistic, multi-modal, context-driven, and general. Then, we demonstrate that the proposed model effectively solves a self-driving prediction problem by producing accurate predictions of road agents in various environments.
Abstract:We introduce Hyper-Conditioned Neural Autoregressive Flow (HCNAF); a powerful universal distribution approximator designed to model arbitrarily complex conditional probability density functions. HCNAF consists of a neural-net based conditional autoregressive flow (AF) and a hyper-network that can take large conditions in non-autoregressive fashion and outputs the network parameters of the AF. Like other flow models, HCNAF performs exact likelihood inference. We demonstrate the effectiveness and attributes of HCNAF, including its generalization capability over unseen conditions and show that HCNAF outperforms recent flow models in a conditional density estimation task for MNIST. We also show that HCNAF scales up to complex high-dimensional prediction problems of the magnitude of self-driving and that HCNAF yields a state-of-the-art performance in a public self-driving dataset.
Abstract:We propose a novel framework to differentiate between vehicle trajectories originating from human and non-human drivers by constructing a data-driven boundary using parametric signal temporal logic (STL). Such construction allows us to evaluate the trajectories, detect rare-events, and reduce the uncertainty of driver behaviors when it assumes the form of a disturbance in control synthesis and evaluation problems. We train a classifier that separates admissible (i.e. human) examples - which arise from real-world demonstrations - and inadmissible (i.e. non-human) examples that are generated by falsifying specifications synthesized from the same real-world driving data. Proceeding in this fashion allows for finding a reasonable boundary of human behaviors exhibited in real-world driving records. The framework is demonstrated using a case study involving a human-driven vehicle approaching a signalized intersection.
Abstract:Predicting future trajectories of human-driven vehicles is a crucial problem in autonomous driving. While the trajectory prediction problem in highway has been well addressed, the problem in city driving where the motions of vehicles are governed by traffic lights has barely been discussed. Despite its importance, no comprehensive model which predicts longitudinal trajectories of vehicles near traffic signals is available. Our idea is to simply utilize information from vehicle-to-infrastructure communications to model how human drivers drive near traffic signals and use the model for the longitudinal trajectory prediction. We propose a "human policy model" which maps a state of a human vehicle and a traffic signal to a longitudinal acceleration of the vehicle. The proposed model is trained on 471,273 data points sampled from 3,398 real-world historical trips conducted by 583 distinct vehicles near a signalized intersection. We used a neural network for learning deterministic (most-likelihood) human policy and a mixture density network for learning probabilistic human policy. Our most-likelihood predictions were as accurate as 0.9-2.3m for the position and 0.3-0.9m/s for the speed (the median error between the predicted and the actual value at 5 seconds into the future) depending on scenarios. This result is far superior to the results obtained from other available models. Our probabilistic policy model provides probabilistic contexts for the predicted trajectories. It is also capable of learning multi-modal distributions which allows the model to capture competing policies, for example, 'pass' or 'stop' in the yellow-light dilemma zone. Finally, we conducted an ablation study to identify the influence of the state features on the deterministic policy model.