Predicting user positive response (e.g., purchases and clicks) probability is a critical task in Web applications. To identify predictive features from raw data, the state-of-the-art extreme deep factorization machine (xDeepFM) model introduces a compressed interaction network (CIN) to leverage feature interactions at the vector-wise level explicitly. However, since each hidden layer in CIN is a collection of feature maps, it can be viewed essentially as an ensemble of different feature maps. In this case, only using a single objective to minimize the prediction loss may lead to overfitting. In this paper, an ensemble diversity enhanced extreme deep factorization machine model (DexDeepFM) is proposed, which introduces the ensemble diversity measure in CIN and considers both ensemble diversity and prediction accuracy in the objective function. In addition, the attention mechanism is introduced to discriminate the importance of ensemble diversity measures with different feature interaction orders. Extensive experiments on two public real-world datasets show the superiority of the proposed model.