In order to replace fossil fuels with the use of renewable energy resources, unbalanced resource production of intermittent wind and photovoltaic (PV) power is a critical issue for peer-to-peer (P2P) power trading. To resolve this problem, a reinforcement learning (RL) technique is introduced in this paper. For RL, graph convolutional network (GCN) and bi-directional long short-term memory (Bi-LSTM) network are jointly applied to P2P power trading between nanogrid clusters based on cooperative game theory. The flexible and reliable DC nanogrid is suitable to integrate renewable energy for distribution system. Each local nanogrid cluster takes the position of prosumer, focusing on power production and consumption simultaneously. For the power management of nanogrid clusters, multi-objective optimization is applied to each local nanogrid cluster with the Internet of Things (IoT) technology. Charging/discharging of electric vehicle (EV) is performed considering the intermittent characteristics of wind and PV power production. RL algorithms, such as deep Q-learning network (DQN), deep recurrent Q-learning network (DRQN), Bi-DRQN, proximal policy optimization (PPO), GCN-DQN, GCN-DRQN, GCN-Bi-DRQN, and GCN-PPO, are used for simulations. Consequently, the cooperative P2P power trading system maximizes the profit utilizing the time of use (ToU) tariff-based electricity cost and system marginal price (SMP), and minimizes the amount of grid power consumption. Power management of nanogrid clusters with P2P power trading is simulated on the distribution test feeder in real-time and proposed GCN-PPO technique reduces the electricity cost of nanogrid clusters by 36.7%.