Intelligent wireless networks have long been expected to have self-configuration and self-optimization capabilities to adapt to various environments and demands. In this paper, we develop a novel distributed hierarchical deep reinforcement learning (DHDRL) framework with two-tier control networks in different timescales to optimize the long-term spectrum efficiency (SE) of the downlink cell-free multiple-input single-output (MISO) network, consisting of multiple distributed access points (AP) and user terminals (UT). To realize the proposed two-tier control strategy, we decompose the optimization problem into two sub-problems, AP-UT association (AUA) as well as beamforming and power allocation (BPA), resulting in a Markov decision process (MDP) and Partially Observable MDP (POMDP). The proposed method consists of two neural networks. At the system level, a distributed high-level neural network is introduced to optimize wireless network structure on a large timescale. While at the link level, a distributed low-level neural network is proposed to mitigate inter-AP interference and improve the transmission performance on a small timescale. Numerical results show that our method is effective for high-dimensional problems, in terms of spectrum efficiency, signaling overhead as well as satisfaction probability, and generalize well to diverse multi-object problems.