Recently, deep learning has been proposed as a potential technique for improving the physical layer performance of radio receivers. Despite the large amount of encouraging results, most works have not considered spatial multiplexing in the context of multiple-input and multiple-output (MIMO) receivers. In this paper, we present a deep learning-based MIMO receiver architecture that consists of a ResNet-based convolutional neural network, also known as DeepRx, combined with a so-called transformation layer, all trained together. We propose two novel alternatives for the transformation layer: a maximal ratio combining-based transformation, or a fully learned transformation. The former relies more on expert knowledge, while the latter utilizes learned multiplicative layers. Both proposed transformation layers are shown to clearly outperform the conventional baseline receiver, especially with sparse pilot configurations. To the best of our knowledge, these are some of the first results showing such high performance for a fully learned MIMO receiver.