Collective phenomena emerge from the interaction of natural or artificial units with a complex organization. The interplay between structural patterns and dynamics might induce functional clusters that, in general, are different from topological ones. In biological systems, like the human brain, the overall functionality is often favored by the interplay between connectivity and synchronization dynamics, with functional clusters that do not coincide with anatomical modules in most cases. In social, socio-technical and engineering systems, the quest for consensus favors the emergence of clusters. Despite the unquestionable evidence for mesoscale organization of many complex systems and the heterogeneity of their inter-connectivity, a way to predict and identify the emergence of functional modules in collective phenomena continues to elude us. Here, we propose an approach based on random walk dynamics to define the diffusion distance between any pair of units in a networked system. Such a metric allows to exploit the underlying diffusion geometry to provide a unifying framework for the intimate relationship between metastable synchronization, consensus and random search dynamics in complex networks, pinpointing the functional mesoscale organization of synthetic and biological systems.