Proactive resource allocation, say proactive caching at wireless edge, has shown promising gain in boosting network performance and improving user experience, by leveraging big data and machine learning. Earlier research efforts focus on optimizing proactive policies under the assumption that the future knowledge required for optimization is perfectly known. Recently, various machine learning techniques are proposed to predict the required knowledge such as file popularity, which is treated as the true value for the optimization. In this paper, we introduce a \emph{proactive optimization} framework for optimizing proactive resource allocation, where the future knowledge is implicitly predicted from historical observations by the optimization. To this end, we formulate a {proactive optimization} problem by taking proactive caching and bandwidth allocation as an example, where the objective function is the conditional expectation of successful offloading probability taken over the unknown popularity given the historically observed popularity. To solve such a problem that depends on the conditional distribution of future information given current and past information, we transform the problem equivalently to a problem depending on the joint distribution of future and historical popularity. Then, we resort to stochastic optimization to learn the joint distribution and resort to unsupervised learning with neural networks to learn the optimal policy. The neural networks can be trained off-line, or in an on-line manner to adapt to the dynamic environment. Simulation results using a real dataset validate that the proposed framework can indeed predict the file popularity implicitly by optimization.