In this paper we propose a novel mathematical optimization based methodology to construct classification forests. A given number of trees are simultaneously constructed, each of them providing a predicted class for each of the observations in the training dataset. An observation is then classified to its most frequently predicted class. We give a mixed integer linear programming formulation for the problem. We report the results of our computational experiments. Our proposed method outperforms state-of-the-art tree-based classification methods on several standard datasets.