The growing demand on high-quality and low-latency multimedia services has led to much interest in edge caching techniques. Motivated by this, we in this paper consider edge caching at the base stations with unknown content popularity distributions. To solve the dynamic control problem of making caching decisions, we propose a deep actor-critic reinforcement learning based multi-agent framework with the aim to minimize the overall average transmission delay. To evaluate the proposed framework, we compare the learning-based performance with three other caching policies, namely least recently used (LRU), least frequently used (LFU), and first-in-first-out (FIFO) policies. Through simulation results, performance improvements of the proposed framework over these three caching algorithms have been identified and its superior ability to adapt to varying environments is demonstrated.