In wireless communication systems, mmWave beam tracking is a critical task that affects both sensing and communications, as it is related to the knowledge of the wireless channel. We consider a setup in which a Base Station (BS) needs to dynamically choose whether the resource will be allocated for one of the three operations: sensing (beam tracking), downlink, or uplink transmission. We devise an approach based on the Proximal Policy Optimization (PPO) algorithm for choosing the resource allocation and beam tracking at a given time slot. The proposed framework takes into account the variable quality of the wireless channel and optimizes the decisions in a coordinated manner. Simulation results demonstrate that the proposed method achieves significant performance improvements in terms of average packet error rate (PER) compared to the baseline methods while providing a significant reduction in beam tracking overhead. We also show that our proposed PPO-based framework provides an effective solution to the resource allocation problem in beam tracking and communication, exhibiting a great generalization performance regardless of the stochastic behavior of the system.