Human Activity Recognition (HAR) is a key building block of many emerging applications such as intelligent mobility, sports analytics, ambient-assisted living and human-robot interaction. With robust HAR, systems will become more human-aware, leading towards much safer and empathetic autonomous systems. While human pose detection has made significant progress with the dawn of deep convolutional neural networks (CNNs), the state-of-the-art research has almost exclusively focused on a single sensing modality, especially video. However, in safety critical applications it is imperative to utilize multiple sensor modalities for robust operation. To exploit the benefits of state-of-the-art machine learning techniques for HAR, it is extremely important to have multimodal datasets. In this paper, we present a novel, multi-modal sensor dataset that encompasses nine indoor activities, performed by 16 participants, and captured by four types of sensors that are commonly used in indoor applications and autonomous vehicles. This multimodal dataset is the first of its kind to be made openly available and can be exploited for many applications that require HAR, including sports analytics, healthcare assistance and indoor intelligent mobility. We propose a novel data preprocessing algorithm to enable adaptive feature extraction from the dataset to be utilized by different machine learning algorithms. Through rigorous experimental evaluations, this paper reviews the performance of machine learning approaches to posture recognition, and analyses the robustness of the algorithms. When performing HAR with the RGB-Depth data from our new dataset, machine learning algorithms such as a deep neural network reached a mean accuracy of up to 96.8% for classification across all stationary and dynamic activities