We study the estimation of $\beta$ for the nonlinear model $y = f(X\sp{\top}\beta) + \epsilon$ when $f$ is a nonlinear transformation that is known, $\beta$ has sparse nonzero coordinates, and the number of observations can be much smaller than that of parameters ($n\ll p$). We show that in order to bound the $L_2$ error of the $L_0$ regularized estimator $\hat\beta$, i.e., $\|\hat\beta - \beta\|_2$, it is sufficient to establish two conditions. Based on this, we obtain bounds of the $L_2$ error for (1) $L_0$ regularized maximum likelihood estimation (MLE) for exponential linear models and (2) $L_0$ regularized least square (LS) regression for the more general case where $f$ is analytic. For the analytic case, we rely on power series expansion of $f$, which requires taking into account the singularities of $f$.