We propose a paradigm shift in the data-driven modeling of the instrumental response field of telescopes. By adding a differentiable optical forward model into the modeling framework, we change the data-driven modeling space from the pixels to the wavefront. This allows to transfer a great deal of complexity from the instrumental response into the forward model while being able to adapt to the observations, remaining data-driven. Our framework allows a way forward to building powerful models that are physically motivated, interpretable, and that do not require special calibration data. We show that for a simplified setting of a space telescope, this framework represents a real performance breakthrough compared to existing data-driven approaches with reconstruction errors decreasing 5 fold at observation resolution and more than 10 fold for a 3x super-resolution. We successfully model chromatic variations of the instrument's response only using noisy broad-band in-focus observations.