Perimeter Control (PC) strategies have been proposed to address urban road network control in oversaturated situations by monitoring transfer flows of the Protected Network (PN). The uniform metering rate for cordon signals in existing studies ignores the variety of local traffic states at the intersection level, which may cause severe local traffic congestion and ruin the network stability. This paper introduces a semi-model dependent Multi-Agent Reinforcement Learning (MARL) framework to conduct PC with heterogeneous cordon signal behaviors. The proposed strategy integrates the MARL-based signal control method with centralized feedback PC policy and is applied to cordon signals of the PN. It operates as a two-stage system, with the feedback PC strategy detecting the overall traffic state within the PN and then distributing local instructions to cordon signals controlled by agents in the MARL framework. Each cordon signal acts independently and differently, creating a slack and distributed PC for the PN. The combination of the model-free and model-based methods is achieved by reconstructing the action-value function of the local agents with PC feedback reward without violating the integrity of the local signal control policy learned from the RL training process. Through numerical tests with different demand patterns in a microscopic traffic environment, the proposed PC strategy (a) is shown robustness, scalability, and transferability, (b) outperforms state-of-the-art model-based PC strategies in increasing network throughput, reducing cordon queue and carbon emission.