Abstract:We propose a data-driven approach to visually detect conversational groups by identifying spatial arrangements typical of these focused social encounters. Our approach uses a novel Deep Affinity Network (DANTE) to predict the likelihood that two individuals in a scene are part of the same conversational group, considering contextual information like the position and orientation of other nearby individuals. The predicted pair-wise affinities are then used in a graph clustering framework to identify both small (e.g., dyads) and bigger groups. The results from our evaluation on two standard benchmarks suggest that the combination of powerful deep learning methods with classical clustering techniques can improve the detection of conversational groups in comparison to prior approaches. Our technique has a wide range of applications from visual scene understanding, e.g., for surveillance, to social robotics.