This paper describes important considerations and challenges associated with online reinforcement-learning based waveform selection for target identification in frequency modulated continuous wave (FMCW) automotive radar systems. We present a novel learning approach based on satisficing Thompson sampling, which quickly identifies a waveform expected to yield satisfactory classification performance. We demonstrate through measurement-level simulations that effective waveform selection strategies can be quickly learned, even in cases where the radar must select from a large catalog of candidate waveforms. The radar learns to adaptively select a bandwidth for appropriate resolution and a slow-time unimodular code for interference mitigation in the scene of interest by optimizing an expected classification metric.