Supporting ultra-high data rates and flexible reconfigurability, Terahertz (THz) mesh networks are attractive for next-generation wireless backhaul systems that empower the integrated access and backhaul (IAB). In THz mesh backhaul networks, the efficient cross-layer routing and long-term resource allocation is yet an open problem due to dynamic traffic demands as well as possible link failures caused by the high directivity and high non-line-of-sight (NLoS) path loss of THz spectrum. In addition, unpredictable data traffic and the mixed integer programming property with the NP-hard nature further challenge the effective routing and long-term resource allocation design. In this paper, a deep reinforcement learning (DRL) based cross-layer design in THz mesh backhaul networks (DEFLECT) is proposed, by considering dynamic traffic demands and possible sudden link failures. In DEFLECT, a heuristic routing metric is first devised to facilitate resource efficiency (RE) enhancement regarding energy and sub-array usages. Furthermore, a DRL based resource allocation algorithm is developed to realize long-term RE maximization and fast recovery from broken links. Specifically in the DRL method, the exploited multi-task structure cooperatively benefits joint power and sub-array allocation. Additionally, the leveraged hierarchical architecture realizes tailored resource allocation for each base station and learned knowledge transfer for fast recovery. Simulation results show that DEFLECT routing consumes less resource, compared to the minimal hop-count metric. Moreover, unlike conventional DRL methods causing packet loss and second-level latency, DEFLECT DRL realizes the long-term RE maximization with no packet loss and millisecond-level latency, and recovers resource-efficient backhaul from broken links within 1s.