Characterizing the dynamic interactive patterns of complex systems helps gain in-depth understanding of how components interrelate with each other while performing certain functions as a whole. In this study, we present a novel multimodal data fusion approach to construct a complex network, which models the interactions of biological subsystems in the human body under emotional states through physiological responses. Joint recurrence plot and temporal network metrics are employed to integrate the multimodal information at the signal level. A benchmark public dataset of is used for evaluating our model.