Overcoming fiber nonlinearity is one of the core challenges limiting the capacity of optical fiber communication systems. Machine learning based solutions such as learned digital backpropagation (LDBP) and the recently proposed deep convolutional recurrent neural network (DCRNN) have been shown to be effective for fiber nonlinearity compensation (NLC). Incorporating distributed compensation of polarization mode dispersion (PMD) within the learned models can improve their performance even further but at the same time, it also couples the compensation of nonlinearity and PMD. Consequently, it is important to consider the time variation of PMD for such a joint compensation scheme. In this paper, we investigate the impact of PMD drift on the DCRNN model with distributed compensation of PMD. We propose a transfer learning based selective training scheme to adapt the learned neural network model to changes in PMD. We demonstrate that fine-tuning only a small subset of weights as per the proposed method is sufficient for adapting the model to PMD drift. Using decision directed feedback for online learning, we track continuous PMD drift resulting from a time-varying rotation of the state of polarization (SOP). We show that transferring knowledge from a pre-trained base model using the proposed scheme significantly reduces the re-training efforts for different PMD realizations. Applying the hinge model for SOP rotation, our simulation results show that the learned models maintain their performance gains while tracking the PMD.