Although the convergence of policy gradient algorithms to first-order stationary points is well-established, the objective functions of reinforcement learning problems are typically highly nonconvex. Therefore, recent work has focused on two extensions: ``global" convergence guarantees under regularity assumptions on the function structure, and second-order guarantees for escaping saddle points and convergence to true local minima. Our work expands on the latter approach, avoiding the restrictive assumptions of the former that may not apply to general objective functions. Existing results on vanilla policy gradient only consider an unbiased gradient estimator, but practical implementations under the infinite-horizon discounted setting, including both Monte-Carlo methods and actor-critic methods, involve gradient descent updates with a biased gradient estimator. We present preliminary results on the convergence of biased policy gradient algorithms to second-order stationary points, leveraging proof techniques from nonconvex optimization. In our next steps we aim to provide the first finite-time second-order convergence analysis for actor-critic algorithms.