Effective water resource management requires information on water availability, both in terms of quality and quantity, spatially and temporally. In this paper, we study the methodology behind Transfer Learning (TL) through fine-tuning and parameter transferring for better generalization performance of streamflow prediction in data-sparse regions. We propose a standard recurrent neural network in the form of Long Short-Term Memory (LSTM) to fit on a sufficiently large source domain dataset and repurpose the learned weights to a significantly smaller, yet similar target domain datasets. We present a methodology to implement transfer learning approaches for spatiotemporal applications by separating the spatial and temporal components of the model and training the model to generalize based on categorical datasets representing spatial variability. The framework is developed on a rich benchmark dataset from the US and evaluated on a smaller dataset collected by The Nature Conservancy in Kenya. The LSTM model exhibits generalization performance through our TL technique. Results from this current experiment demonstrate the effective predictive skill of forecasting streamflow responses when knowledge transferring and static descriptors are used to improve hydrologic model generalization in data-sparse regions.