Examining graphs for similarity is a well-known challenge, but one that is mandatory for grouping graphs together. We present a data-driven method to cluster traffic scenes that is self-supervised, i.e. without manual labelling. We leverage the semantic scene graph model to create a generic graph embedding of the traffic scene, which is then mapped to a low-dimensional embedding space using a Siamese network, in which clustering is performed. In the training process of our novel approach, we augment existing traffic scenes in the Cartesian space to generate positive similarity samples. This allows us to overcome the challenge of reconstructing a graph and at the same time obtain a representation to describe the similarity of traffic scenes. We could show, that the resulting clusters possess common semantic characteristics. The approach was evaluated on the INTERACTION dataset.