Multimodal wearable physiological data in daily life have been used to estimate self-reported stress labels. However, missing data modalities in data collection makes it challenging to leverage all the collected samples. Besides, heterogeneous sensor data and labels among individuals add challenges in building robust stress detection models. In this paper, we proposed a modality fusion network (MFN) to train models and infer self-reported binary stress labels under both complete and incomplete modality conditions. In addition, we applied personalized attention (PA) strategy to leverage personalized representation along with the generalized one-size-fits-all model. We evaluated our methods on a multimodal wearable sensor dataset (N=41) including galvanic skin response (GSR) and electrocardiogram (ECG). Compared to the baseline method using the samples with complete modalities, the performance of the MFN improved by 1.6% in f1-scores. On the other hand, the proposed PA strategy showed a 2.3% higher stress detection f1-score and approximately up to 70% reduction in personalized model parameter size (9.1 MB) compared to the previous state-of-the-art transfer learning strategy (29.3 MB).