This paper presents a three layer spiking neural network based region proposal network operating on data generated by neuromorphic vision sensors. The proposed architecture consists of refractory, convolution and clustering layers designed with bio-realistic leaky integrate and fire (LIF) neurons and synapses. The proposed algorithm is tested on traffic scene recordings from a DAVIS sensor setup. The performance of the region proposal network has been compared with event based mean shift algorithm and is found to be far superior (~50% better) in recall for similar precision (~85%). Computational and memory complexity of the proposed method are also shown to be similar to that of event based mean shift