Radiotherapy is the main treatment modality for nasopharynx cancer. Delineation of Gross Target Volume (GTV) from medical images such as CT and MRI images is a prerequisite for radiotherapy. As manual delineation is time-consuming and laborious, automatic segmentation of GTV has a potential to improve this process. Currently, most of the deep learning-based automatic delineation methods of GTV are mainly performed on medical images like CT images. However, it is challenged by the low contrast between the pathology regions and surrounding soft tissues, small target region, and anisotropic resolution of clinical CT images. To deal with these problems, we propose a 2.5D Convolutional Neural Network (CNN) to handle the difference of inplane and through-plane resolution. Furthermore, we propose a spatial attention module to enable the network to focus on small target, and use channel attention to further improve the segmentation performance. Moreover, we use multi-scale sampling method for training so that the networks can learn features at different scales, which are combined with a multi-model ensemble method to improve the robustness of segmentation results. We also estimate the uncertainty of segmentation results based on our model ensemble, which is of great importance for indicating the reliability of automatic segmentation results for radiotherapy planning.