We proposed an end-to-end deep learning-based simultaneous localization and mapping (SLAM) system following conventional visual odometry (VO) pipelines. The proposed method completes the SLAM framework by including tracking, mapping, and sequential optimization networks while training them in an unsupervised manner. Together with the camera pose and depth map, we estimated the observational uncertainty to make our system robust to noises such as dynamic objects. We evaluated our method using public indoor and outdoor datasets. The experiment demonstrated that our method works well in tracking and mapping tasks and performs comparably with other learning-based VO approaches. Notably, the proposed uncertainty modeling and sequential training yielded improved generality in a variety of environments.