Abstract:The extraction of spectral features from a music clip is a computationally expensive task. As in order to extract accurate features, we need to process the clip for its whole length. This preprocessing task creates a large overhead and also makes the extraction process slower. We show how formatting a dataset in a certain way, can help make the process more efficient by eliminating the need for processing the clip for its whole duration, and still extract the features accurately. In addition, we discuss the possibility of defining set generic durations for analyzing a certain type of music clip while training. And in doing so we cut down the need of processing the clip duration to just 10% of the global average.