Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Victor Fortier

Exploiting temporal information to detect conversational groups in videos and predict the next speaker

Aug 29, 2024

Lucrezia Tosato, Victor Fortier, Isabelle Bloch, Catherine Pelachaud

Figure 1 for Exploiting temporal information to detect conversational groups in videos and predict the next speaker

Figure 2 for Exploiting temporal information to detect conversational groups in videos and predict the next speaker

Figure 3 for Exploiting temporal information to detect conversational groups in videos and predict the next speaker

Figure 4 for Exploiting temporal information to detect conversational groups in videos and predict the next speaker

Abstract:Studies in human human interaction have introduced the concept of F formation to describe the spatial arrangement of participants during social interactions. This paper has two objectives. It aims at detecting F formations in video sequences and predicting the next speaker in a group conversation. The proposed approach exploits time information and human multimodal signals in video sequences. In particular, we rely on measuring the engagement level of people as a feature of group belonging. Our approach makes use of a recursive neural network, the Long Short Term Memory (LSTM), to predict who will take the speaker's turn in a conversation group. Experiments on the MatchNMingle dataset led to 85% true positives in group detection and 98% accuracy in predicting the next speaker.

* Pattern Recognition Letters Volume 177, January 2024, Pages 164 168
* Accepted to Pattern Recognition Letter, 8 pages, 10 figures

Via

Access Paper or Ask Questions