While several matrix factorization (MF) and tensor factorization (TF) models have been proposed for knowledge base (KB) inference, they have rarely been compared across various datasets. Is there a single model that performs well across datasets? If not, what characteristics of a dataset determine the performance of MF and TF models? Is there a joint TF+MF model that performs robustly on all datasets? We perform an extensive evaluation to compare popular KB inference models across popular datasets in the literature. In addition to answering the questions above, we remove a limitation in the standard evaluation protocol for MF models, propose an extension to MF models so that they can better handle out-of-vocabulary (OOV) entity pairs, and develop a novel combination of TF and MF models. We also analyze and explain the results based on models and dataset characteristics. Our best model is robust, and obtains strong results across all datasets.