Over the past few decades, the hydrology community has witnessed notable advancements in streamflow prediction, particularly with the introduction of cutting-edge machine-learning algorithms. Recurrent neural networks, especially Long Short-Term Memory (LSTM) networks, have become popular due to their capacity to create precise forecasts and realistically mimic the system dynamics. Attention-based models, such as Transformers, can learn from the entire data sequence concurrently, a feature that LSTM does not have. This work tests the hypothesis that combining recurrence with attention can improve streamflow prediction. We set up the Temporal Fusion Transformer (TFT) architecture, a model that combines both of these aspects and has never been applied in hydrology before. We compare the performance of LSTM, Transformers, and TFT over 2,610 globally distributed catchments from the recently available Caravan dataset. Our results demonstrate that TFT indeed exceeds the performance benchmark set by the LSTM and Transformers for streamflow prediction. Additionally, being an explainable AI method, TFT helps in gaining insights into the streamflow generation processes.