Deep learning (DL) methods have been shown to improve the performance of several use cases for the fifth-generation (5G) New radio (NR) air interface. In this paper we investigate user equipment (UE) positioning using the channel state information (CSI) fingerprints between a UE and multiple base stations (BSs). In such a setup, a single DL model can be trained for UE positioning using the CSI fingerprints of the multiple BSs as input. Alternatively, based on the CSI at each BS, a separate DL model can be trained at each BS and then the output of the different models are combined to determine the UE's position. In this work we compare these different fusion techniques and show that fusing the output of separate models achieves higher positioning accuracy, especially in a dynamic scenario. We also show that the fusion of multiple outputs further benefits from considering the uncertainty of the output of the DL model at each BS. For a more efficient training of the DL model across BSs, we additionally propose a multi-task learning (MTL) scheme by sharing some parameters across the models while jointly training all models. This method, not only improves the accuracy of the individual models, but also of the final combined estimate. Lastly, we evaluate the reliability of the uncertainty estimation to ascertain which of the fusion methods provides the highest quality of uncertainty estimates.