Abstract:Identifying optimal collective variables to model transformations, using atomic-scale simulations, is a long-standing challenge. We propose a new method for the generation, optimization, and comparison of collective variables, which can be thought of as a data-driven generalization of the path collective variable concept. It consists in a kernel ridge regression of the committor probability, which encodes a transformation's progress. The resulting collective variable is one-dimensional, interpretable, and differentiable, making it appropriate for enhanced sampling simulations requiring biasing. We demonstrate the validity of the method on two different applications: a precipitation model, and the association of Li$^+$ and F$^-$ in water. For the former, we show that global descriptors such as the permutation invariant vector allow to reach an accuracy far from the one achieved \textit{via} simpler, more intuitive variables. For the latter, we show that information correlated with the transformation mechanism is contained in the first solvation shell only, and that inertial effects prevent the derivation of optimal collective variables from the atomic positions only.