Proton pencil beam scanning (PBS) treatment planning for head and neck (H&N) cancers is a time-consuming and experience-demanding task where a large number of planning objectives are involved. Deep reinforcement learning (DRL) has recently been introduced to the planning processes of intensity-modulated radiation therapy and brachytherapy for prostate, lung, and cervical cancers. However, existing approaches are built upon the Q-learning framework and weighted linear combinations of clinical metrics, suffering from poor scalability and flexibility and only capable of adjusting a limited number of planning objectives in discrete action spaces. We propose an automatic treatment planning model using the proximal policy optimization (PPO) algorithm and a dose distribution-based reward function for proton PBS treatment planning of H&N cancers. Specifically, a set of empirical rules is used to create auxiliary planning structures from target volumes and organs-at-risk (OARs), along with their associated planning objectives. These planning objectives are fed into an in-house optimization engine to generate the spot monitor unit (MU) values. A decision-making policy network trained using PPO is developed to iteratively adjust the involved planning objective parameters in a continuous action space and refine the PBS treatment plans using a novel dose distribution-based reward function. Proton H&N treatment plans generated by the model show improved OAR sparing with equal or superior target coverage when compared with human-generated plans. Moreover, additional experiments on liver cancer demonstrate that the proposed method can be successfully generalized to other treatment sites. To the best of our knowledge, this is the first DRL-based automatic treatment planning model capable of achieving human-level performance for H&N cancers.