Semantic communication (SemComm) has emerged as new paradigm shifts.Most existing SemComm systems transmit continuously distributed signals in analog fashion.However, the analog paradigm is not compatible with current digital communication frameworks. In this paper, we propose an alternating multi-phase training strategy (AMP) to enable the joint training of the networks in the encoder and decoder through non-differentiable digital processes. AMP contains three training phases, aiming at feature extraction (FE), robustness enhancement (RE), and training-testing alignment (TTA), respectively. AMP contains three training phases, aiming at feature extraction (FE), robustness enhancement (RE), and training-testing alignment (TTA), respectively. In particular, in the FE stage, we learn the representation ability of semantic information by end-to-end training the encoder and decoder in an analog manner. When we take digital communication into consideration, the domain shift between digital and analog demands the fine-tuning for encoder and decoder. To cope with joint training process within the non-differentiable digital processes, we propose the alternation between updating the decoder individually and jointly training the codec in RE phase. To boost robustness further, we investigate a mask-attack (MATK) in RE to simulate an evident and severe bit-flipping effect in a differentiable manner. To address the training-testing inconsistency introduced by MATK, we employ an additional TTA phase, fine-tuning the decoder without MATK. Combining with AMP and an information restoration network, we propose a digital SemComm system for image transmission, named AMP-SC. Comparing with the representative benchmark, AMP-SC achieves $0.82 \sim 1.65$dB higher average reconstruction performance among various representative datasets at different scales and a wide range of signal-to-noise ratio.