Multi-band operation in wireless networks can improve data rates by leveraging the benefits of propagation in different frequency ranges. Distinctive beam management procedures in different bands complicate band assignment because they require considering not only the channel quality but also the associated beam management overhead. Reinforcement learning (RL) is a promising approach for multi-band operation as it enables the system to learn and adjust its behavior through environmental feedback. In this paper, we formulate a sequential decision problem to jointly perform band assignment and beam management. We propose a method based on hierarchical RL (HRL) to handle the complexity of the problem by separating the policies for band selection and beam management. We evaluate the proposed HRL-based algorithm on a realistic channel generated based on ray-tracing simulators. Our results show that the proposed approach outperforms traditional RL approaches in terms of reduced beam training overhead and increased data rates under a realistic vehicular channel.