We present CT-Bound, a fast boundary estimation method for noisy images using a hybrid Convolution and Transformer neural network. The proposed architecture decomposes boundary estimation into two tasks: local detection and global regularization of image boundaries. It first estimates a parametric representation of boundary structures only using the input image within a small receptive field and then refines the boundary structure in the parameter domain without accessing the input image. Because of this, a part of the network can be easily trained using naive, synthetic images and still generalized to real images, and the entire architecture is computationally efficient as the boundary refinement is non-iterative and not in the image domain. Compared with the previous highest accuracy methods, our experiment shows that CT-Bound is 100 times faster, producing comparably accurate, high-quality boundary and color maps. We also demonstrate that CT-Bound can produce boundary and color maps on real captured images without extra fine-tuning and real-time boundary map and color map videos at ten frames per second.