Structural similarity (SSIM)-based distortion $D_\text{SSIM}$ is more consistent with human perception than the traditional mean squared error $D_\text{MSE}$. To achieve better video quality, many studies on optimal bit allocation (OBA) and rate-distortion optimization (RDO) used $D_\text{SSIM}$ as the distortion metric. However, many of them failed to optimize OBA and RDO jointly based on SSIM, thus causing a non-optimal R-$D_\text{SSIM}$ performance. This problem is due to the lack of an accurate R-$D_\text{SSIM}$ model that can be used uniformly in both OBA and RDO. To solve this problem, we propose a $D_\text{SSIM}$-$D_\text{MSE}$ model first. Based on this model, the complex R-$D_\text{SSIM}$ cost in RDO can be calculated as simpler R-$D_\text{MSE}$ cost with a new SSIM-related Lagrange multiplier. This not only reduces the computation burden of SSIM-based RDO, but also enables the R-$D_\text{SSIM}$ model to be uniformly used in OBA and RDO. Moreover, with the new SSIM-related Lagrange multiplier in hand, the joint relationship of R-$D_\text{SSIM}$-$\lambda_\text{SSIM}$ (the negative derivative of R-$D_\text{SSIM}$) can be built, based on which the R-$D_\text{SSIM}$ model parameters can be calculated accurately. With accurate and unified R-$D_\text{SSIM}$ model, SSIM-based OBA and SSIM-based RDO are unified together in our scheme, called SOSR. Compared with the HEVC reference encoder HM16.20, SOSR saves 4%, 10%, and 14% bitrate under the same SSIM in all-intra, hierarchical and non-hierarchical low-delay-B configurations, which is superior to other state-of-the-art schemes.