Detection of interacting and conversational groups from images has applications in video surveillance and social robotics. In this paper we build on prior attempts to find conversational groups by detection of social gathering spaces called o-spaces used to assign people to groups. As our contributions to the task, we are the first paper to incorporate features extracted from the room layout image, and the first to incorporate a deep network to generate an image representation of the proposed o-spaces. Specifically, this novel network builds on the PointNet architecture which allows unordered inputs of variable sizes. We present accuracies which demonstrate the ability to rival and sometimes outperform the best models, but due to a data imbalance issue we do not yet outperform existing models in our test results.