We study the kernel instrumental variable algorithm of \citet{singh2019kernel}, a nonparametric two-stage least squares (2SLS) procedure which has demonstrated strong empirical performance. We provide a convergence analysis that covers both the identified and unidentified settings: when the structural function cannot be identified, we show that the kernel NPIV estimator converges to the IV solution with minimum norm. Crucially, our convergence is with respect to the strong $L_2$-norm, rather than a pseudo-norm. Additionally, we characterize the smoothness of the target function without relying on the instrument, instead leveraging a new description of the projected subspace size (this being closely related to the link condition in inverse learning literature). With the subspace size description and under standard kernel learning assumptions, we derive, for the first time, the minimax optimal learning rate for kernel NPIV in the strong $L_2$-norm. Our result demonstrates that the strength of the instrument is essential to achieve efficient learning. We also improve the original kernel NPIV algorithm by adopting a general spectral regularization in stage 1 regression. The modified regularization can overcome the saturation effect of Tikhonov regularization.