Stacked intelligent metasurface (SIM) has emerged as a technology enabling wave domain beamforming through multiple stacked reconfigurable intelligent surfaces (RISs). SIM has been implemented so far with diagonal RIS (D-RIS), while SIM implemented with beyond diagonal RIS (BD-RIS) remains unexplored. Furthermore, a model of SIM accounting for mutual coupling is not yet available. To fill these gaps, we derive a physically consistent channel model for SIM-aided systems and clarify the assumptions needed to obtain the simplified model used in related works. Using this model, we show that 1-layer SIM implemented with BD-RIS achieves the performance upper bound with limited complexity.