The goal of active learning is to achieve the same accuracy achievable by passive learning, while using much fewer labels. Exponential savings in label complexity are provably guaranteed in very special cases, but fundamental lower bounds show that such improvements are impossible in general. This suggests a need to explore alternative goals for active learning. Learning with abstention is one such alternative. In this setting, the active learning algorithm may abstain from prediction in certain cases and incur an error that is marginally smaller than $\frac{1}{2}$. We develop the first computationally efficient active learning algorithm with abstention. Furthermore, the algorithm is guaranteed to only abstain on hard examples (where the true label distribution is close to a fair coin), a novel property we term "proper abstention" that also leads to a host of other desirable characteristics. The option to abstain reduces the label complexity by an exponential factor, with no assumptions on the distribution, relative to passive learning algorithms and/or active learning that are not allowed to abstain. A key feature of the algorithm is that it avoids the undesirable "noise-seeking" behavior often seen in active learning. We also explore extensions that achieve constant label complexity and deal with model misspecification.