When should an online reinforcement learning-based frequency agile cognitive radar be expected to outperform a rule-based adaptive waveform selection strategy? We seek insight regarding this question by examining a dynamic spectrum access scenario, in which the radar wishes to transmit in the widest unoccupied bandwidth during each pulse repetition interval. Online learning is compared to a fixed rule-based sense-and-avoid strategy. We show that given a simple Markov channel model, the problem can be examined analytically for simple cases via stochastic dominance. Additionally, we show that for more realistic channel assumptions, learning-based approaches demonstrate greater ability to generalize. However, for short time-horizon problems that are well-specified, we find that machine learning approaches may perform poorly due to the inherent limitation of convergence time. We draw conclusions as to when learning-based approaches are expected to be beneficial and provide guidelines for future study.