Nowadays, the PQ flexibility from the distributed energy resources (DERs) in the high voltage (HV) grids plays a more critical and significant role in grid congestion management in TSO grids. This work proposed a multi-stage deep reinforcement learning approach to estimate the PQ flexibility (PQ area) at the TSO-DSO interfaces and identifies the DER PQ setpoints for each operating point in a way, that DERs in the meshed HV grid can be coordinated to offer flexibility for the transmission grid. In the estimation process, we consider the steady-state grid limits and the robustness in the resulting voltage profile against uncertainties and the N-1 security criterion regarding thermal line loading, essential for real-life grid operational planning applications. Using deep reinforcement learning (DRL) for PQ flexibility estimation is the first of its kind. Furthermore, our approach of considering N-1 security criterion for meshed grids and robustness against uncertainty directly in the optimization tasks offers a new perspective besides the common relaxation schema in finding a solution with mathematical optimal power flow (OPF). Finally, significant improvements in the computational efficiency in estimation PQ area are the highlights of the proposed method.