The contribution of this paper is two fold. First, it presents a novel approach called DeepBiRD which is inspired from human visual perception and exploits layout features to identify individual references in a scientific publication. Second, we present a new dataset for image-based reference detection with 2401 scans containing 12244 references, all manually annotated for individual reference. Our proposed approach consists of two stages, firstly it identifies whether given document image is single column or multi-column. Using this information, document image is then splitted into individual columns. Secondly it performs layout driven reference detection using Mask R-CNN in a given scientific publication. DeepBiRD was evaluated on two different datasets to demonstrate the generalization of this approach. The proposed system achieved an F-measure of 0.96 on our dataset. DeepBiRD detected 2.5 times more references than current state-of-the-art approach on their own dataset. Therefore, suggesting that DeepBiRD is significantly superior in performance, generalizable and independent of any domain or referencing style.