Gravitational wave detectors such as LIGO and Virgo are susceptible to various types of instrumental and environmental disturbances known as glitches which can mask and mimic gravitational waves. While there are 22 classes of non-Gaussian noise gradients currently identified, the number of classes is likely to increase as these detectors go through commissioning between observation runs. Since identification and labelling new noise gradients can be arduous and time-consuming, we propose $\beta$-Annelead VAEs to learn representations from spectograms in an unsupervised way. Using the same formulation as \cite{alemi2017fixing}, we view Bottleneck-VAEs~cite{burgess2018understanding} through the lens of information theory and connect them to $\beta$-VAEs~cite{higgins2017beta}. Motivated by this connection, we propose an annealing schedule for the hyperparameter $\beta$ in $\beta$-VAEs which has advantages of: 1) One fewer hyperparameter to tune, 2) Better reconstruction quality, while producing similar levels of disentanglement.