In recent years, convolutional neural network has gained popularity in many engineering applications especially for computer vision. In order to achieve better performance, often more complex structures and advanced operations are incorporated into the neural networks, which results very long inference time. For time-critical tasks such as autonomous driving and virtual reality, real-time processing is fundamental. In order to reach real-time process speed, a light-weight, high-throughput CNN architecture namely RoadNet-RT is proposed for road segmentation in this paper. It achieves 90.33% MaxF score on test set of KITTI road segmentation task and 8 ms per frame when running on GTX 1080 GPU. Comparing to the state-of-the-art network, RoadNet-RT speeds up the inference time by a factor of 20 at the cost of only 6.2% accuracy loss. For hardware design optimization, several techniques such as depthwise separable convolution and non-uniformed kernel size convolution are customized designed to further reduce the processing time. The proposed CNN architecture has been successfully implemented on an FPGA ZCU102 MPSoC platform that achieves the computation capability of 83.05 GOPS. The system throughput reaches 327.9 frames per second with image size 1216x176.