Reconfigurable intelligent surfaces (RIS) have been actively researched as a potential technique for future wireless communications, which intelligently ameliorate the signal propagation environment. In the conventional design, each RIS element configures and reflects its received signal independently of all other RIS elements, which results in a diagonal phase shift matrix. By contrast, we propose a novel RIS architecture, where the incident signal impinging on one element can be reflected from another element after an appropriate phase shift adjustment, which increases the flexibility in the design of RIS phase shifts, hence, potentially improving the system performance. The resultant RIS phase shift matrix also has off-diagonal elements, as opposed to the pure diagonal structure of the conventional design. Compared to the state-of-art fully-connected/group-connected RIS structures, our proposed RIS architecture has lower complexity, while attaining a higher channel gain than the group-connected RIS structure, and approaching that of the fully-connected RIS structure. We formulate and solve the problem of maximizing the achievable rate of our proposed RIS architecture by jointly optimizing the transmit beamforming and the non-diagonal phase shift matrix based on alternating optimization and semi-define relaxation (SDR) methods. Moreover, the closed-form expressions of the channel gain, the outage probability and bit error ratio (BER) are derived. Simulation results demonstrate that our proposed RIS architecture results in an improved performance in terms of the achievable rate compared to the conventional architecture, both in single-user as well as in multi-user scenarios.