We are interested in multilayer graph clustering, which aims at dividing the graph nodes into categories or communities. To do so, we propose to learn a clustering-friendly embedding of the graph nodes by solving an optimization problem that involves a fidelity term to the layers of a given multilayer graph, and a regularization on the (single-layer) graph induced by the embedding. The fidelity term uses the contrastive loss to properly aggregate the observed layers into a representative embedding. The regularization pushes for a sparse and community-aware graph, and it is based on a measure of graph sparsification called "effective resistance", coupled with a penalization of the first few eigenvalues of the representative graph Laplacian matrix to favor the formation of communities. The proposed optimization problem is nonconvex but fully differentiable, and thus can be solved via the descent gradient method. Experiments show that our method leads to a significant improvement w.r.t. state-of-the-art multilayer graph clustering algorithms.