We consider the problem of active linear regression with $\ell_2$-bounded noise. In this context, the learner receives a set of unlabeled data points, chooses a small subset to receive the labels for, and must give an estimate of the function that performs well on fresh samples. We give an algorithm that is simultaneously optimal in the number of labeled and unlabeled data points, with $O(d)$ labeled samples; previous work required $\Omega(d \log d)$ labeled samples regardless of the number of unlabeled samples. Our results also apply to learning linear functions from noisy queries, again achieving optimal sample complexities. The techniques extend beyond linear functions, giving improved sample complexities for learning the family of $k$-Fourier-sparse signals with continuous frequencies.