There is a fast-growing literature on estimating optimal treatment regimes based on randomized trials or observational studies under a key identifying condition of no unmeasured confounding. Because confounding by unmeasured factors cannot generally be ruled out with certainty in observational studies or randomized trials subject to noncompliance, we propose a general instrumental variable approach to learning optimal treatment regimes under endogeneity. Specifically, we provide sufficient conditions for the identification of both value function $E[Y_{\cD(L)}]$ for a given regime $\cD$ and optimal regime $\arg \max_{\cD} E[Y_{\cD(L)}]$ with the aid of a binary instrumental variable, when no unmeasured confounding fails to hold. We establish consistency of the proposed weighted estimators. We also extend the proposed method to identify and estimate the optimal treatment regime among those who would comply to the assigned treatment under monotonicity. In this latter case, we establish the somewhat surprising result that the complier optimal regime can be consistently estimated without directly collecting compliance information and therefore without the complier average treatment effect itself being identified. Furthermore, we propose novel semiparametric locally efficient and multiply robust estimators. Our approach is illustrated via extensive simulation studies and a data application on the effect of child rearing on labor participation.