This study focuses on simultaneous decision making for stochastic multi-echelon inventory optimization problems. Mixed supply chain networks are considered that may contain assembly or distribution nodes, or both, and may use nonlinear cost structure. We present a framework which uses deep neural networks as agents responsible for finding order-up-to levels for any desired components of the general supply chain network. Agents simultaneously interact with the environment in an unsupervised manner to minimize total inventory cost. Not only does this study consider several decision-makers simultaneously for stages of a general supply chain network, but it also presents clear and interpretable order-up-to levels. First, we numerically show the effectiveness of the method by solving newsvendor and serial supply chain networks and compare the results with the available closed form solutions for these settings. Then, we investigate a mixed supply chain network and a more general case study. The findings indicate that the proposed method performs better in terms of objective function values and the number of interactions with the environment compared to alternatives. In addition, the method finds inventory policies similar to simple base-stock policies for general SCNs. Moreover, we generally notice that for echelons closer to the source, fixed optimal order-up-to levels can be considerably larger than the expected demands these echelons observe.