In massive multiple-input multiple-output (MIMO) systems, the large number of antennas would bring a great challenge for the acquisition of the accurate channel state information, especially in the frequency division duplex mode. To overcome the bottleneck of the limited number of radio links in hybrid beamforming, we utilize the neural networks (NNs) to capture the inherent connection between the uplink and downlink channel data sets and extrapolate the downlink channels from a subset of the uplink channel state information. We study the antenna subset selection problem in order to achieve the best channel extrapolation and decrease the data size of NNs. The probabilistic sampling theory is utilized to approximate the discrete antenna selection as a continuous and differentiable function, which makes the back propagation of the deep learning feasible. Then, we design the proper off-line training strategy to optimize both the antenna selection pattern and the extrapolation NNs. Finally, numerical results are presented to verify the effectiveness of our proposed massive MIMO channel extrapolation algorithm.