We introduce GeXSe (Generative Explanatory Sensor System), a novel framework designed to extract interpretable sensor-based and vision domain features from non-invasive smart space sensors. We combine these to provide a comprehensive explanation of sensor-activation patterns in activity recognition tasks. This system leverages advanced machine learning architectures, including transformer blocks, Fast Fourier Convolution (FFC), and diffusion models, to provide a more detailed understanding of sensor-based human activity data. A standout feature of GeXSe is our unique Multi-Layer Perceptron (MLP) with linear, ReLU, and normalization layers, specially devised for optimal performance on small datasets. It also yields meaningful activation maps to explain sensor-based activation patterns. The standard approach is based on a CNN model, which our MLP model outperforms.GeXSe offers two types of explanations: sensor-based activation maps and visual domain explanations using short videos. These methods offer a comprehensive interpretation of the output from non-interpretable sensor data, thereby augmenting the interpretability of our model. Utilizing the Frechet Inception Distance (FID) for evaluation, it outperforms established methods, improving baseline performance by about 6\%. GeXSe also achieves a high F1 score of up to 0.85, demonstrating precision, recall, and noise resistance, marking significant progress in reliable and explainable smart space sensing systems.