In this paper, we propose a sampling algorithm based on statistical machine learning to obtain conditional nonlinear optimal perturbation (CNOP), which is essentially different from the traditional deterministic optimization methods. The new approach does not only reduce the extremely expensive gradient (first-order) information directly by the objective value (zeroth-order) information, but also avoid the use of adjoint technique that gives rise to the huge storage problem and the instability from linearization. Meanwhile, an intuitive anlysis and a rigorous concentration inequality for the approximate gradient by sampling are shown. The numerical experiments to obtain the CNOPs by the performance of standard spatial sturctures for a theoretical model, Burgers equation with small viscosity, demonstrate that at the cost of losing accuracy, fewer samples spend time relatively shorter than the adjoint-based method and directly from definition. Finally, we reveal that the nonlinear time evolution of the CNOPs obtained by all the algorithms are almost consistent with the quantity of norm square of perturbations, their difference and relative difference on the basis of the definition method.