As several new spectrum bands are opening up for shared use, a new paradigm of \textit{Diverse Band-aware Dynamic Spectrum Access} (d-DSA) has emerged. d-DSA equips a secondary device with software defined radios (SDRs) and utilize whitespaces (or idle channels) in \textit{multiple bands}, including but not limited to TV, LTE, Citizen Broadband Radio Service (CBRS), unlicensed ISM. In this paper, we propose a decentralized, online multi-agent reinforcement learning based cross-layer BAnd selection and Routing Design (BARD) for such d-DSA networks. BARD not only harnesses whitespaces in multiple spectrum bands, but also accounts for unique electro-magnetic characteristics of those bands to maximize the desired quality of service (QoS) requirements of heterogeneous message packets; while also ensuring no harmful interference to the primary users in the utilized band. Our extensive experiments demonstrate that BARD outperforms the baseline dDSAaR algorithm in terms of message delivery ratio, however, at a relatively higher network latency, for varying number of primary and secondary users. Furthermore, BARD greatly outperforms its single-band DSA variants in terms of both the metrics in all considered scenarios.