'Radiomics' is a method that extracts mineable quantitative features from radiographic images. These features can then be used to determine prognosis, for example, predicting the development of distant metastases (DM). Existing radiomics methods, however, require complex manual effort including the design of hand-crafted radiomic features and their extraction and selection. Recent radiomics methods, based on convolutional neural networks (CNNs), also require manual input in network architecture design and hyper-parameter tuning. Radiomic complexity is further compounded when there are multiple imaging modalities, for example, combined positron emission tomography - computed tomography (PET-CT) where there is functional information from PET and complementary anatomical localization information from computed tomography (CT). Existing multi-modality radiomics methods manually fuse the data that are extracted separately. Reliance on manual fusion often results in sub-optimal fusion because they are dependent on an 'expert's' understanding of medical images. In this study, we propose a multi-modality neural architecture search method (MM-NAS) to automatically derive optimal multi-modality image features for radiomics and thus negate the dependence on a manual process. We evaluated our MM-NAS on the ability to predict DM using a public PET-CT dataset of patients with soft-tissue sarcomas (STSs). Our results show that our MM-NAS had a higher prediction accuracy when compared to state-of-the-art radiomics methods.