We propose a general deep learning architecture for wave-based imaging problems. A key difficulty in imaging problems with varying background wave speed is that the medium "bends" the waves differently depending on their position and direction. This space-bending geometry makes the equivariance to translations of convolutional networks an undesired inductive bias. We build an interpretable architecture based on wave physics, as captured by the Fourier integral operators (FIOs). FIOs appear in the description of a wide range of wave-based imaging modalities, from seismology and radar to Doppler and ultrasound. Their geometry is characterized by a canonical relation which governs the propagation of singularities. We learn this geometry via optimal transport in the wave packet representation. The proposed FIONet performs significantly better than the usual baselines on a number of inverse problems, especially in out-of-distribution tests.