In recent years, Fully Convolutional Networks (FCN) has been widely used in various semantic segmentation tasks, including multi-modal remote sensing imagery. How to fuse multi-modal data to improve the segmentation performance has always been a research hotspot. In this paper, a novel end-toend fully convolutional neural network is proposed for semantic segmentation of natural color, infrared imagery and Digital Surface Models (DSM). It is based on a modified DeepUNet and perform the segmentation in a multi-task way. The channels are clustered into groups and processed on different task pipelines. After a series of segmentation and fusion, their shared features and private features are successfully merged together. Experiment results show that the feature fusion network is efficient. And our approach achieves good performance in ISPRS Semantic Labeling Contest (2D).