Despite the potential of weakly supervised learning to automatically annotate massive amounts of data, little is known about its limitations for use in computer-aided diagnosis (CAD). For CT specifically, interpreting the performance of CAD algorithms can be challenging given the large number of co-occurring diseases. This paper examines the effect of co-occurring diseases when training classification models by weakly supervised learning, specifically by comparing multi-label and multiple binary classifiers using the same training data. Our results demonstrated that the binary model outperformed the multi-label classification in every disease category in terms of AUC. However, this performance was heavily influenced by co-occurring diseases in the binary model, suggesting it did not always learn the correct appearance of the specific disease. For example, binary classification of lung nodules resulted in an AUC of < 0.65 when there were no other co-occurring diseases, but when lung nodules co-occurred with emphysema, the performance reached AUC > 0.80. We hope this paper revealed the complexity of interpreting disease classification performance in weakly supervised models and will encourage researchers to examine the effect of co-occurring diseases on classification performance in the future.