Finger photo Presentation Attack Detection (PAD) can significantly strengthen smartphone device security. However, these algorithms are trained to detect certain types of attacks. Furthermore, they are designed to operate on images acquired by specific capture devices, leading to poor generalization and a lack of robustness in handling the evolving nature of mobile hardware. The proposed investigation is the first to systematically analyze the performance degradation of existing deep learning PAD systems, convolutional and transformers, in cross-capture device settings. In this paper, we introduce the ColFigPhotoAttnNet architecture designed based on window attention on color channels, followed by the nested residual network as the predictor to achieve a reliable PAD. Extensive experiments using various capture devices, including iPhone13 Pro, GooglePixel 3, Nokia C5, and OnePlusOne, were carried out to evaluate the performance of proposed and existing methods on three publicly available databases. The findings underscore the effectiveness of our approach.