Image summary, an abridged version of the original visual content, can be used to represent the scene. Thus, tasks such as scene classification, identification, indexing, etc., can be performed efficiently using the unique summary. Saliency is the most commonly used technique for generating the relevant image summary. However, the definition of saliency is subjective in nature and depends upon the application. Existing saliency detection methods using RGB-D data mainly focus on color, texture, and depth features. Consequently, the generated summary contains either foreground objects or non-stationary objects. However, applications such as scene identification require stationary characteristics of the scene, unlike state-of-the-art methods. This paper proposes a novel volumetric saliency-guided framework for indoor scene classification. The results highlight the efficacy of the proposed method.