Abstract:This paper investigates the network-assisted full-duplex (NAFD) cell-free millimeter-wave (mmWave) networks, where the distribution of the transmitting access points (T-APs) and receiving access points (R-APs) across distinct geographical locations mitigates cross-link interference, facilitating the attainment of a truly flexible duplex mode. To curtail deployment expenses and power consumption for mmWave band operations, each AP incorporates a hybrid digital-analog structure encompassing precoder/combiner functions. However, this incorporation introduces processing intricacies within channel estimation and precoding/combining design. In this paper, we first present a hybrid multiple-input multiple-output (MIMO) processing framework and derive explicit expressions for both uplink and downlink achievable rates. Then we formulate a power allocation problem to maximize the weighted bidirectional sum rates. To tackle this non-convex problem, we develop a collaborative multi-agent deep reinforcement learning (MADRL) algorithm called multi-agent twin delayed deep deterministic policy gradient (MATD3) for NAFD cell-free mmWave networks. Specifically, given the tightly coupled nature of both uplink and downlink power coefficients in NAFD cell-free mmWave networks, the MATD3 algorithm resolves such coupled conflicts through an interactive learning process between agents and the environment. Finally, the simulation results validate the effectiveness of the proposed channel estimation methods within our hybrid MIMO processing paradigm, and demonstrate that our MATD3 algorithm outperforms both multi-agent deep deterministic policy gradient (MADDPG) and conventional power allocation strategies.