In the Emergency Department (ED), accurate prediction of critical events using Electronic Health Records (EHR) allows timely intervention and effective resource allocation. Though many studies have suggested automatic prediction methods, their coarse-grained time resolutions limit their practical usage. Therefore, in this study, we propose an hourly prediction method of critical events in ED, i.e., mortality and vasopressor need. Through extensive experiments, we show that both 1) bi-modal fusion between EHR text and time-series data and 2) self-supervised predictive regularization using L2 loss between normalized context vector and EHR future time-series data improve predictive performance, especially the far-future prediction. Our uni-modal/bi-modal/bi-modal self-supervision scored 0.846/0.877/0.897 (0.824/0.855/0.886) and 0.817/0.820/0.858 (0.807/0.81/0.855) with mortality (far-future mortality) and with vasopressor need (far-future vasopressor need) prediction data in AUROC, respectively.