We investigate whether transformers can learn to track a random process when given observations of a related process and parameters of the dynamical system that relates them as context. More specifically, we consider a finite-dimensional state-space model described by the state transition matrix $F$, measurement matrices $h_1, \dots, h_N$, and the process and measurement noise covariance matrices $Q$ and $R$, respectively; these parameters, randomly sampled, are provided to the transformer along with the observations $y_1,\dots,y_N$ generated by the corresponding linear dynamical system. We argue that in such settings transformers learn to approximate the celebrated Kalman filter, and empirically verify this both for the task of estimating hidden states $\hat{x}_{N|1,2,3,...,N}$ as well as for one-step prediction of the $(N+1)^{st}$ observation, $\hat{y}_{N+1|1,2,3,...,N}$. A further study of the transformer's robustness reveals that its performance is retained even if the model's parameters are partially withheld. In particular, we demonstrate that the transformer remains accurate at the considered task even in the absence of state transition and noise covariance matrices, effectively emulating operations of the Dual-Kalman filter.