We present a novel stochastic approach to binary optimization for optimal experimental design (OED) for Bayesian inverse problems governed by mathematical models such as partial differential equations. The OED utility function, namely, the regularized optimality criterion, is cast into a stochastic objective function in the form of an expectation over a multivariate Bernoulli distribution. The probabilistic objective is then solved by using a stochastic optimization routine to find an optimal observational policy. The proposed approach is analyzed from an optimization perspective and also from a machine learning perspective with correspondence to policy gradient reinforcement learning. The approach is demonstrated numerically by using an idealized two-dimensional Bayesian linear inverse problem, and validated by extensive numerical experiments carried out for sensor placement in a parameter identification setup.