Recently, beyond diagonal reconfigurable intelligent surface (BD-RIS) has been proposed to generalize conventional RIS. BD-RIS has a scattering matrix that is not restricted to being diagonal and thus brings a performance improvement over conventional RIS. While different BD-RIS architectures have been proposed, it still remains an open problem to develop a systematic approach to design BD-RIS architectures achieving the optimal trade-off between performance and circuit complexity. In this work, we propose novel modeling, architecture design, and optimization for BD-RIS based on graph theory. This graph theoretical modeling allows us to develop two new efficient BD-RIS architectures, denoted as tree-connected and forest-connected RIS. Tree-connected RIS, whose corresponding graph is a tree, is proven to be the least complex BD-RIS architecture able to achieve the performance upper bound in multiple-input single-output (MISO) systems. Besides, forest-connected RIS allows us to strike a balance between performance and complexity, further decreasing the complexity over tree-connected RIS. To optimize tree-connected RIS, we derive a closed-form global optimal solution, while forest-connected RIS is optimized through a low-complexity iterative algorithm. Numerical results confirm that tree-connected (resp. forest-connected) RIS achieves the same performance as fully-connected (resp. group-connected) RIS, while reducing the complexity by up to 16.4 times.