This paper introduces a novel approach for 3D semantic instance segmentation on point clouds. A 3D convolutional neural network called submanifold sparse convolutional network is used to generate semantic predictions and instance embeddings simultaneously. To obtain discriminative embeddings for each 3D instance, a structure-aware loss function is proposed which considers both the structure information and the embedding information. To get more consistent embeddings for each 3D instance, attention-based k nearest neighbour (KNN) is proposed to assign different weights for different neighbours. Based on the attention-based KNN, we add a graph convolutional network after the sparse convolutional network to get refined embeddings. Our network can be trained end-to-end. A simple mean-shift algorithm is utilized to cluster refined embeddings to get final instance predictions. As a result, our framework can output both the semantic prediction and the instance prediction. Experiments show that our approach outperforms all state-of-art methods on ScanNet benchmark and NYUv2 dataset.