We generalize the notion of average Lipschitz smoothness proposed by Ashlagi et al. (COLT 2021) by extending it to H\"older smoothness. This measure of the ``effective smoothness'' of a function is sensitive to the underlying distribution and can be dramatically smaller than its classic ``worst-case'' H\"older constant. We prove nearly tight upper and lower risk bounds in terms of the average H\"older smoothness, establishing the minimax rate in the realizable regression setting up to log factors; this was not previously known even in the special case of average Lipschitz smoothness. From an algorithmic perspective, since our notion of average smoothness is defined with respect to the unknown sampling distribution, the learner does not have an explicit representation of the function class, hence is unable to execute ERM. Nevertheless, we provide a learning algorithm that achieves the (nearly) optimal learning rate. Our results hold in any totally bounded metric space, and are stated in terms of its intrinsic geometry. Overall, our results show that the classic worst-case notion of H\"older smoothness can be essentially replaced by its average, yielding considerably sharper guarantees.