The current evolution towards a massive number of antennas and a large variety of transceiver architectures forces to revisit the conventional techniques used to improve the fundamental power amplifier (PA) linearity-efficiency trade-off. Most of the digital linearization techniques rely on PA measurements using a dedicated feedback receiver. However, in modern systems with large amount of RF chains and high carrier frequency, dedicated receiver per RF chain is costly and complex to implement. This issue can be addressed by measuring PAs over the air, but in that case, this extra signalling is sharing resources with the actual data transmission. In this paper, we look at the problem from an estimation theory point of view so as to minimize pilot overhead while optimizing estimation performance. We show that conventional results in the mathematical statistics community can be used. We find the least squares (LS) optimal training design, minimizing the maximal mean squared error (MSE) of the reconstructed PA response over its whole input range. As compared to uniform training, simulations demonstrate a factor 10 reduction of the maximal MSE for a L = 7 PA polynomial order. Using prior information, the LMMSE estimator can achieve an additional gain of a factor up to 300 at low signal-to-noise ratio (SNR).