Reconfigurable intelligent surfaces (RIS) is regarded as a key enabler of wave/analog-domain beamforming, processing, and computing in future wireless communication systems. Recently, Beyond Diagonal RIS (BD-RIS) has been proposed as a generalization of conventional RIS, offering enhanced design flexibility thanks to the presence of tunable impedances that connect RIS elements. However, increased interconnections lead to high circuit complexity, which poses a significant practical challenge. In this paper, we address the fundamental open question: What is the class of BD-RIS architectures that achieves the optimal performance in a RIS-aided multiuser multi-input multi-output (MIMO) system? By modeling BD-RIS architectures using graph theory, we identify a class of BD-RIS architectures that achieves the optimal performance--matching that of fully-connected RIS--while maintaining low circuit complexity. Our result holds for a broad class of performance metrics, including the commonly used sum channel gain/sum-rate/energy efficiency maximization, transmit power minimization, and the information-theoretic capacity region. The number of tunable impedances in the proposed class is ${O}(N\min\{D,N/2\})$, where $N$ denotes the number of RIS elements and $D$ is the degree of freedom of the multiuser MIMO channel, i.e., the minimum between the number of transmit antennas and the total number of received antennas across all users. Since $D$ is much smaller than $N$ in practice, the complexity scales as ${O}(ND)$, which is substantially lower than the ${O}(N^2)$ complexity of fully-connected RIS. We further introduce two novel BD-RIS architectures--band-connected RIS and stem-connected RIS--and show that they belong to the optimal architecture class under certain conditions. Simulation results validate the optimality and enhanced performance-complexity tradeoff of our proposed architecture.