Designing effective Unmanned Aerial Vehicle(UAV)-assisted routing protocols is challenging due to changing topology, limited battery capacity, and the dynamic nature of communication environments. Current protocols prioritize optimizing individual network parameters, overlooking the necessity for a nuanced approach in scenarios with intermittent connectivity, fluctuating signal strength, and varying network densities, ultimately failing to address aerial network requirements comprehensively. This paper proposes a novel, Improved Q-learning-based Multi-hop Routing (IQMR) algorithm for optimal UAV-assisted communication systems. Using Q(\lambda) learning for routing decisions, IQMR substantially enhances energy efficiency and network data throughput. IQMR improves system resilience by prioritizing reliable connectivity and inter-UAV collision avoidance while integrating real-time network status information, all in the absence of predefined UAV path planning, thus ensuring dynamic adaptability to evolving network conditions. The results validate IQMR's adaptability to changing system conditions and superiority over the current techniques. IQMR showcases 36.35\% and 32.05\% improvements in energy efficiency and data throughput over the existing methods.