Abstract:Designing a driving policy for autonomous vehicles is a difficult task. Recent studies suggested an end-toend (E2E) training of a policy to predict car actuators directly from raw sensory inputs. It is appealing due to the ease of labeled data collection and since handcrafted features are avoided. Explicit drawbacks such as interpretability, safety enforcement and learning efficiency limit the practical application of the approach. In this paper, we amend the basic E2E architecture to address these shortcomings, while retaining the power of end-to-end learning. A key element in our proposed architecture is formulation of the learning problem as learning of trajectory. We also apply a Gaussian mixture model loss to contend with multi-modal data, and adopt a finance risk measure, conditional value at risk, to emphasize rare events. We analyze the effect of each concept and present driving performance in a highway scenario in the TORCS simulator. Video is available in this link: https://www.youtube.com/watch?v=1JYNBZNOe_4