In this work, we study high-dimensional mean estimation under user-level differential privacy, and attempt to design an $(\epsilon,\delta)$-differentially private mechanism using as few users as possible. In particular, we provide a nearly optimal trade-off between the number of users and the number of samples per user required for private mean estimation, even when the number of users is as low as $O(\frac{1}{\epsilon}\log\frac{1}{\delta})$. Interestingly our bound $O(\frac{1}{\epsilon}\log\frac{1}{\delta})$ on the number of users is independent of the dimension, unlike the previous work that depends polynomially on the dimension, solving a problem left open by Amin et al.~(ICML'2019). Our mechanism enjoys robustness up to the point that even if the information of $49\%$ of the users are corrupted, our final estimation is still approximately accurate. Finally, our results also apply to a broader range of problems such as learning discrete distributions, stochastic convex optimization, empirical risk minimization, and a variant of stochastic gradient descent via a reduction to differentially private mean estimation.