Though parameter shift rules have drastically improved gradient estimation methods for several types of quantum circuits, leading to improved performance in downstream tasks, so far they have not been transferable to linear optics with single photons. In this work, we derive an analytical formula for the gradients in these circuits with respect to phaseshifters via a generalized parameter shift rule, where the number of parameter shifts depends linearly on the total number of photons. Experimentally, this enables access to derivatives in photonic systems without the need for finite difference approximations. Building on this, we propose two strategies through which one can reduce the number of shifts in the expression, and hence reduce the overall sample complexity. Numerically, we show that this generalized parameter-shift rule can converge to the minimum of a cost function with fewer parameter update steps than alternative techniques. We anticipate that this method will open up new avenues to solving optimization problems with photonic systems, as well as provide new techniques for the experimental characterization and control of linear optical systems.