Gait recognition refers to the identification of individuals based on features acquired from their body movement during walking. Despite the recent advances in gait recognition with deep learning, variations in data acquisition and appearance, namely camera angles, subject pose, occlusions, and clothing, are challenging factors that need to be considered for achieving accurate gait recognition systems. In this paper, we propose a network that first learns to extract gait convolutional energy maps (GCEM) from frame-level convolutional features. It then adopts a bidirectional recurrent neural network to learn from split bins of the GCEM, thus exploiting the relations between learned partial spatiotemporal representations. We then use an attention mechanism to selectively focus on important recurrently learned partial representations as identity information in different scenarios may lie in different GCEM bins. Our proposed model has been extensively tested on two large-scale CASIA-B and OU-MVLP gait datasets using four different test protocols and has been compared to a number of state-of-the-art and baseline solutions. Additionally, a comprehensive experiment has been performed to study the robustness of our model in the presence of six different synthesized occlusions. The experimental results show the superiority of our proposed method, outperforming the state-of-the-art, especially in scenarios where different clothing and carrying conditions are encountered. The results also revealed that our model is more robust against different occlusions as compared to the state-of-the-art methods.