The problem of computing and characterizing Region of Attraction (ROA) with its many variations have a long tradition in safety-critical systems and control theory. By virtue here comes the connections to Lyapunov functions that are considered as the centerpiece of stability theory for a non linear dynamical systems. The agents may be imperfect because of several limitations in the sensors which ultimately restrict to fully observe the potential adversaries in the environment. Therefore while interacting with human life an autonomous robot should safely explore the outdoor environment by avoiding the dangerous states that may cause physical harm both the systems and environment. In this paper we address this problem and propose a framework of learning policies that adapt to the shape of largest safe region in the state space. At the inception the model is trained to learn an accurate safety certificate for non-linear closed loop dynamics system by constructing Lyapunov Neural Network. The current work is also an extension of the previous work of computing ROA under a fixed policy. Specifically we discuss how to design a state feedback controller by using a typical kind of performance objective function to be optimized and demonstrates our method on a simulated inverted pendulum which clearly shows that how this model can be used to resolve issues of trade-offs and extra design freedom.