We study the theoretical limits of the $\ell_0$ (quasi) norm based optimization algorithms when employed for solving classical compressed sensing or sparse regression problems. Considering standard contexts with deterministic signals and statistical systems, we utilize \emph{Fully lifted random duality theory} (Fl RDT) and develop a generic analytical program for studying performance of the \emph{maximum-likelihood} (ML) decoding. The key ML performance parameter, the residual \emph{root mean square error} ($\textbf{RMSE}$), is uncovered to exhibit the so-called \emph{phase-transition} (PT) phenomenon. The associated aPT curve, which separates the regions of systems dimensions where \emph{an} $\ell_0$ based algorithm succeeds or fails in achieving small (comparable to the noise) ML optimal $\textbf{RMSE}$ is precisely determined as well. In parallel, we uncover the existence of another dPT curve which does the same separation but for practically feasible \emph{descending} $\ell_0$ ($d\ell_0$) algorithms. Concrete implementation and practical relevance of the Fl RDT typically rely on the ability to conduct a sizeable set of the underlying numerical evaluations which reveal that for the ML decoding the Fl RDT converges astonishingly fast with corrections in the estimated quantities not exceeding $\sim 0.1\%$ already on the third level of lifting. Analytical results are supplemented by a sizeable set of numerical experiments where we implement a simple variant of $d\ell_0$ and demonstrate that its practical performance very accurately matches the theoretical predictions. Completely surprisingly, a remarkably precise agreement between the simulations and the theory is observed for fairly small dimensions of the order of 100.