We present a robust visual-inertial SLAM system that combines the benefits of Convolutional Neural Networks (CNNs) and planar constraints. Our system leverages a CNN to predict the depth map and the corresponding uncertainty map for each image. The CNN depth effectively bootstraps the back-end optimization of SLAM and meanwhile the CNN uncertainty adaptively weighs the contribution of each feature point to the back-end optimization. Given the gravity direction from the inertial sensor, we further present a fast plane detection method that detects horizontal planes via one-point RANSAC and vertical planes via two-point RANSAC. Those stably detected planes are in turn used to regularize the back-end optimization of SLAM. We evaluate our system on a public dataset, \ie, EuRoC, and demonstrate improved results over a state-of-the-art SLAM system, \ie, ORB-SLAM3.