Quick and accurate assessment of the damage state of buildings after natural disasters is crucial for undertaking properly targeted rescue and subsequent recovery operations, which can have a major impact on the safety of victims and the cost of disaster recovery. The quality of such a process can be significantly improved by harnessing the potential of machine learning methods in computer vision. This paper presents a novel damage assessment method using an original multi-step feature fusion network for the classification of the damage state of buildings based on pre- and post-disaster large-scale satellite images. We introduce a novel convolutional neural network (CNN) module that performs feature fusion at multiple network levels between pre- and post-disaster images in the horizontal and vertical directions of CNN network. An additional network element - Fuse Module - was proposed to adapt any CNN model to analyze image pairs in the issue of pair classification. We use, open, large-scale datasets (IDA-BD and xView2) to verify, that the proposed method is suitable to improve on existing state-of-the-art architectures. We report over a 3 percentage point increase in the accuracy of the Vision Transformer model.