Due to the distinct objectives and multipath utilization mechanisms between the communication module and radar module, the system design of integrated sensing and communication (ISAC) necessitates two types of channel state information (CSI), i.e., communication CSI representing the whole channel gain and phase shifts, and radar CSI exclusively focused on target mobility and position information. However, current ISAC systems apply an identical mechanism to estimate both types of CSI at the same predetermined estimation interval, leading to significant overhead and compromised performances. Therefore, this paper proposes an intermittent communication and radar CSI estimation scheme with adaptive intervals for individual users/targets, where both types of CSI can be predicted using channel temporal correlations for cost reduction or re-estimated via training signal transmission for improved estimation accuracy. Specifically, we jointly optimize the binary CSI re-estimation/prediction decisions and transmit beamforming matrices for individual users/targets to maximize communication transmission rates and minimize radar tracking errors and costs in a multiple-input single-output (MISO) ISAC system. Unfortunately, this problem has causality issues because it requires comparing system performances under re-estimated CSI and predicted CSI during the optimization. Additionally, the binary decision makes the joint design a mixed integer nonlinear programming (MINLP) problem, resulting in high complexity when using conventional optimization algorithms. Therefore, we propose a deep reinforcement online learning (DROL) framework that first implements an online deep neural network (DNN) to learn the binary CSI updating decisions from the experiences. Given the learned decisions, we propose an efficient algorithm to solve the remaining beamforming design problem efficiently.