Abstract:Extending Moore's law by augmenting complementary-metal-oxide semiconductor (CMOS) transistors with emerging nanotechnologies (X) has become increasingly important. Accelerating Monte Carlo algorithms that rely on random sampling with such CMOS+X technologies could have significant impact on a large number of fields from probabilistic machine learning, optimization to quantum simulation. In this paper, we show the combination of stochastic magnetic tunnel junction (sMTJ)-based probabilistic bits (p-bits) with versatile Field Programmable Gate Arrays (FPGA) to design a CMOS + X (X = sMTJ) prototype. Our approach enables high-quality true randomness that is essential for Monte Carlo based probabilistic sampling and learning. Our heterogeneous computer successfully performs probabilistic inference and asynchronous Boltzmann learning, despite device-to-device variations in sMTJs. A comprehensive comparison using a CMOS predictive process design kit (PDK) reveals that compact sMTJ-based p-bits replace 10,000 transistors while dissipating two orders of magnitude of less energy (2 fJ per random bit), compared to digital CMOS p-bits. Scaled and integrated versions of our CMOS + stochastic nanomagnet approach can significantly advance probabilistic computing and its applications in various domains by providing massively parallel and truly random numbers with extremely high throughput and energy-efficiency.