Ultrasonic imaging is being used to obtain information about the acoustic properties of a medium by emitting waves into it and recording their interaction using ultrasonic transducer arrays. The Delay-And-Sum (DAS) algorithm forms images using the main path on which reflected signals travel back to the transducers. In some applications, different insonification paths can be considered, for instance by placing the transducers at different locations or if strong reflectors inside the medium are known a-priori. These different modes give rise to multiple DAS images reflecting different geometric information about the scatterers and the challenge is to either fuse them into one image or to directly extract higher-level information regarding the materials of the medium, e.g., a segmentation map. Traditional image fusion techniques typically use ad-hoc combinations of pre-defined image transforms, pooling operations and thresholding. In this work, we propose a deep neural network (DNN) architecture that directly maps all available data to a segmentation map while explicitly incorporating the DAS image formation for the different insonification paths as network layers. This enables information flow between data pre-processing and image post-processing DNNs, trained end-to-end. We compare our proposed method to a traditional image fusion technique using simulated data experiments, mimicking a non-destructive testing application with four image modes, i.e., two transducer locations and two internal reflection boundaries. Using our approach, it is possible to obtain much more accurate segmentation of defects.