Heralded by the initial success in speech recognition and image classification, learning-based approaches with neural networks, commonly referred to as deep learning, have spread across various fields. A primitive form of a neural network functions as a deterministic mapping from one vector to another, parameterized by trainable weights. This is well suited for point estimation in which the model learns a one-to-one mapping (e.g., mapping a front camera view to a steering angle) that is required to solve the task of interest. Although learning such a deterministic, one-to-one mapping is effective, there are scenarios where modeling \emph{multimodal} data distributions, namely learning one-to-many relationships, is helpful or even necessary. In this thesis, we adopt a generative modeling perspective on robotics problems. Generative models learn and produce samples from multimodal distributions, rather than performing point estimation. We will explore the advantages this perspective offers for three topics in robotics.