AI-assisted characterization of chest x-rays (CXR) has the potential to provide substantial benefits across many clinical applications. Many large-scale public CXR datasets have been curated for detection of abnormalities using deep learning. However, each of these datasets focus on detecting a subset of disease labels that could be present in a CXR, thus limiting their clinical utility. Furthermore, the distributed nature of these datasets, along with data sharing regulations, make it difficult to share and create a complete representation of disease labels. We propose surgical aggregation, a federated learning framework for aggregating knowledge from distributed datasets with different disease labels into a 'global' deep learning model. We randomly divided the NIH Chest X-Ray 14 dataset into training (70%), validation (10%), and test (20%) splits with no patient overlap and conducted two experiments. In the first experiment, we pruned the disease labels to create two 'toy' datasets containing 11 and 8 labels respectively with 4 overlapping labels. For the second experiment, we pruned the disease labels to create two disjoint 'toy' datasets with 7 labels each. We observed that the surgically aggregated 'global' model resulted in excellent performance across both experiments when compared to a 'baseline' model trained on complete disease labels. The overlapping and disjoint experiments had an AUROC of 0.87 and 0.86 respectively, compared to the baseline AUROC of 0.87. We used surgical aggregation to harmonize the NIH Chest X-Ray 14 and CheXpert datasets into a 'global' model with an AUROC of 0.85 and 0.83 respectively. Our results show that surgical aggregation could be used to develop clinically useful deep learning models by aggregating knowledge from distributed datasets with diverse tasks, a step forward towards bridging the gap from bench to bedside.