The crossMoDA2023 challenge aims to segment the vestibular schwannoma (sub-divided into intra- and extra-meatal components) and cochlea regions of unlabeled hrT2 scans by leveraging labeled ceT1 scans. In this work, we proposed a 3D multi-style cross-modality segmentation framework for the crossMoDA2023 challenge, including the multi-style translation and self-training segmentation phases. Considering heterogeneous distributions and various image sizes in multi-institutional scans, we first utilize the min-max normalization, voxel size resampling, and center cropping to obtain fixed-size sub-volumes from ceT1 and hrT2 scans for training. Then, we perform the multi-style image translation phase to overcome the intensity distribution discrepancy between unpaired multi-modal scans. Specifically, we design three different translation networks with 2D or 2.5D inputs to generate multi-style and realistic target-like volumes from labeled ceT1 volumes. Finally, we perform the self-training volumetric segmentation phase in the target domain, which employs the nnU-Net framework and iterative self-training method using pseudo-labels for training accurate segmentation models in the unlabeled target domain. On the crossMoDA2023 validation dataset, our method produces promising results and achieves the mean DSC values of 72.78% and 80.64% and ASSD values of 5.85 mm and 0.25 mm for VS tumor and cochlea regions, respectively. Moreover, for intra- and extra-meatal regions, our method achieves the DSC values of 59.77% and 77.14%, respectively.