In Joint Communication and Radar (JCR)-based Autonomous Vehicle (AV) systems, optimizing waveform structure is one of the most challenging tasks due to strong influences between radar and data communication functions. Specifically, the preamble of a data communication frame is typically leveraged for the radar function. As such, the higher number of preambles in a Coherent Processing Interval (CPI) is, the greater radar's performance is. In contrast, communication efficiency decreases as the number of preambles increases. Moreover, AVs' surrounding radio environments are usually dynamic with high uncertainties due to their high mobility, making the JCR's waveform optimization problem even more challenging. To that end, this paper develops a novel JCR framework based on the Markov decision process framework and recent advanced techniques in deep reinforcement learning. By doing so, without requiring complete knowledge of the surrounding environment in advance, the JCR-AV can adaptively optimize its waveform structure (i.e., number of frames in the CPI) to maximize radar and data communication performance under the surrounding environment's dynamic and uncertainty. Extensive simulations show that our proposed approach can improve the joint communication and radar performance up to 46.26% compared with those of conventional methods (e.g., greedy policy- and fixed waveform-based approaches).