In this paper, novel gradient based online learning algorithms are developed to investigate an important environmental application: real-time river pollution source identification, which aims at estimating the released mass, the location and the released time of a river pollution source based on downstream sensor data monitoring the pollution concentration. The problem can be formulated as a non-convex loss minimization problem in statistical learning, and our online algorithms have vectorized and adaptive step-sizes to ensure high estimation accuracy on dimensions having different magnitudes. In order to avoid gradient-based method sticking into the saddle points of non-convex loss, the "escaping from saddle points" module and multi-start version of algorithms are derived to further improve the estimation accuracy by searching for the global minimimals of the loss functions. It can be shown theoretically and experimentally $O(N)$ local regret of the algorithms, and the high probability cumulative regret bound $O(N)$ under particular error bound condition on loss functions. A real-life river pollution source identification example shows superior performance of our algorithms than the existing methods in terms of estimating accuracy. The managerial insights for decision maker to use the algorithm in reality are also provided.