We propose a multimodal data fusion method by obtaining a $M+1$ dimensional tensor to consider the high-order relationship between $M$ modalities and the output layer of a neural network model. Applying a modality-based tensor factorization method, which adopts different factors for different modalities, results in removing the redundant information with respect to model outputs and leads to fewer model parameters with minimal loss of performance. This factorization method works as a regularizer which leads to a less complicated model and avoids overfitting. In addition, a modality-based factorization approach helps to understand the amount of useful information in each modality. We have applied this method to three different multimodal datasets in sentiment analysis, personality trait recognition, and emotion recognition. The results demonstrate that the approach yields a 1\% to 4\% improvement on several evaluation measures compared to the state-of-the-art for all three tasks.