In the management of computer network systems, the service function chaining (SFC) modules play an important role by generating efficient paths for network traffic through physical servers with virtualized network functions (VNF). To provide the highest quality of services, the SFC module should generate a valid path quickly even in various network topology situations including dynamic VNF resources, various requests, and changes of topologies. The previous supervised learning method demonstrated that the network features can be represented by graph neural networks (GNNs) for the SFC task. However, the performance was limited to only the fixed topology with labeled data. In this paper, we apply reinforcement learning methods for training models on various network topologies with unlabeled data. In the experiments, compared to the previous supervised learning method, the proposed methods demonstrated remarkable flexibility in new topologies without re-designing and re-training, while preserving a similar level of performance.