Existing neural response generation models have achieved impressive improvements for two-party conversations, which assume that utterances are sequentially organized. However, many real-world dialogues involve multiple interlocutors and the structure of conversational context is much more complex, e.g. utterances from different interlocutors can occur "in parallel". Facing this challenge, there are works trying to model the relations among utterances or interlocutors to facilitate response generation with clearer context. Nonetheless, these methods rely heavily on such relations and all assume that these are given beforehand, which is impractical and hinders the generality of such methods. In this work, we propose to automatically infer the relations via relational thinking on subtle clues inside the conversation context without any human label, and leverage these relations to guide the neural response generation. Specifically, we first apply a deep graph random process to fully consider all possible relations among utterances in the conversational context. Then the inferred relation graphs are integrated with a variational auto-encoder framework to train a GAN for structure-aware response generation. Experimental results on the Ubuntu Internet Relay Chat (IRC) channel benchmark and the most recent Movie Dialogues show that our method outperforms various baseline models for multi-party response generation.