An efficient and accurate traffic monitoring system often takes advantages of multi-sensor detection to ensure the safety of urban traffic, promoting the accuracy and robustness of target detection and tracking. A method for target detection using Radar-Vision Fusion Path Aggregation Fully Convolutional One-Stage Network (RV-PAFCOS) is proposed in this paper, which is extended from Fully Convolutional One-Stage Network (FCOS) by introducing the modules of radar image processing branches, radar-vision fusion and path aggregation. The radar image processing branch mainly focuses on the image modeling based on the spatiotemporal calibration of millimeter-wave (mmw) radar and cameras, taking the conversion of radar point clouds to radar images. The fusion module extracts features of radar and optical images based on the principle of spatial attention stitching criterion. The path aggregation module enhances the reuse of feature layers, combining the positional information of shallow feature maps with deep semantic information, to obtain better detection performance for both large and small targets. Through the experimental analysis, the method proposed in this paper can effectively fuse the mmw radar and vision perceptions, showing good performance in traffic target detection.