Stroke is a common disabling neurological condition that affects about one-quarter of the adult population over age 25; more than half of patients still have poor outcomes, such as permanent functional dependence or even death, after the onset of acute stroke. The aim of this study is to investigate the efficacy of diffusion-weighted MRI modalities combining with structured health profile on predicting the functional outcome to facilitate early intervention. A deep fusion learning network is proposed with two-stage training: the first stage focuses on cross-modality representation learning and the second stage on classification. Supervised contrastive learning is exploited to learn discriminative features that separate the two classes of patients from embeddings of individual modalities and from the fused multimodal embedding. The network takes as the input DWI and ADC images, and structured health profile data. The outcome is the prediction of the patient needing long-term care at 3 months after the onset of stroke. Trained and evaluated with a dataset of 3297 patients, our proposed fusion model achieves 0.87, 0.80 and 80.45% for AUC, F1-score and accuracy, respectively, outperforming existing models that consolidate both imaging and structured data in the medical domain. If trained with comprehensive clinical variables, including NIHSS and comorbidities, the gain from images on making accurate prediction is not considered substantial, but significant. However, diffusion-weighted MRI can replace NIHSS to achieve comparable level of accuracy combining with other readily available clinical variables for better generalization.