We consider the problem of learning uncertainty regions for parameter estimation problems. The regions are ellipsoids that minimize the average volumes subject to a prescribed coverage probability. As expected, under the assumption of jointly Gaussian data, we prove that the optimal ellipsoid is centered around the conditional mean and shaped as the conditional covariance matrix. In more practical cases, we propose a differentiable optimization approach for approximately computing the optimal ellipsoids using a neural network with proper calibration. Compared to existing methods, our network requires less storage and less computations in inference time, leading to accurate yet smaller ellipsoids. We demonstrate these advantages on four real-world localization datasets.