This paper proposes a convex formulation for sparse multicategory linear discriminant analysis and then extend it to the distributed setting when data are stored across multiple sites. The key observation is that for the purpose of classification it suffices to recover the discriminant subspace which is invariant to orthogonal transformations. Theoretically, we establish statistical properties ensuring that the distributed sparse multicategory linear discriminant analysis performs as good as the centralized version after {a few rounds} of communications. Numerical studies lend strong support to our methodology and theory.