In this work, we consider the problem of jointly estimating a set of room impulse responses (RIRs) corresponding to closely spaced microphones. The accurate estimation of RIRs is crucial in acoustic applications such as speech enhancement, noise cancellation, and auralization. However, real-world constraints such as short excitation signals, low signal-to-noise ratios, and poor spectral excitation, often render the estimation problem ill-posed. In this paper, we address these challenges by means of optimal mass transport (OMT) regularization. In particular, we propose to use an OMT barycenter, or generalized mean, as a mechanism for information sharing between the microphones. This allows us to quantify and exploit similarities in the delay-structures between the different microphones without having to impose rigid assumptions on the room acoustics. The resulting estimator is formulated in terms of the solution to a convex optimization problem which can be implemented using standard solvers. In numerical examples, we demonstrate the potential of the proposed method in addressing otherwise ill-conditioned estimation scenarios.