Wi-Fi devices can effectively be used as passive radar systems that sense what happens in the surroundings and can even discern human activity. We propose, for the first time, a principled architecture which employs Variational Auto-Encoders for estimating a latent distribution responsible for generating the data, and Evidential Deep Learning for its ability to sense out-of-distribution activities. We verify that the fused data processed by different antennas of the same Wi-Fi receiver results in increased accuracy of human activity recognition compared with the most recent benchmarks, while still being informative when facing out-of-distribution samples and enabling semantic interpretation of latent variables in terms of physical phenomena. The results of this paper are a first contribution toward the ultimate goal of providing a flexible, semantic characterisation of black-swan events, i.e., events for which we have limited to no training data.