We study differentially private (DP) federated learning (FL) with non-convex loss functions and heterogeneous (non-i.i.d.) client data in the absence of a trusted server, both with and without a secure "shuffler" to anonymize client reports. We propose novel algorithms that satisfy local differential privacy (LDP) at the client level and shuffle differential privacy (SDP) for three classes of Lipschitz continuous loss functions: First, we consider losses satisfying the Proximal Polyak-Lojasiewicz (PL) inequality, which is an extension of the classical PL condition to the constrained setting. Prior works studying DP PL optimization only consider the unconstrained problem with Lipschitz loss functions, which rules out many interesting practical losses, such as strongly convex, least squares, and regularized logistic regression. However, by analyzing the proximal PL scenario, we permit such losses which are Lipschitz on a restricted parameter domain. We propose LDP and SDP algorithms that nearly attain the optimal strongly convex, homogeneous (i.i.d.) rates. Second, we provide the first DP algorithms for non-convex/non-smooth loss functions. Third, we specialize our analysis to smooth, unconstrained non-convex FL. Our bounds improve on the state-of-the-art, even in the special case of a single client, and match the non-private lower bound in certain practical parameter regimes. Numerical experiments show that our algorithm yields better accuracy than baselines for most privacy levels.