Self-supervised learning (SSL) is a reliable learning mechanism in which a robot enhances its perceptual capabilities. Typically, in SSL a trusted, primary sensor cue provides supervised training data to a secondary sensor cue. In this article, a theoretical analysis is performed on the fusion of the primary and secondary cue in a minimal model of SSL. A proof is provided that determines the specific conditions under which it is favorable to perform fusion. In short, it is favorable when (i) the prior on the target value is strong or (ii) the secondary cue is sufficiently accurate. The theoretical findings are validated with computational experiments. Subsequently, a real-world case study is performed to investigate if fusion in SSL is also beneficial when assumptions of the minimal model are not met. In particular, a flying robot learns to map pressure measurements to sonar height measurements and then fuses the two, resulting in better height estimation. Fusion is also beneficial in the opposite case, when pressure is the primary cue. The analysis and results are encouraging to study SSL fusion also for other robots and sensors.