We present a self-supervised variational autoencoder (VAE) to jointly learn disentangled and dependent hidden factors and then enhance disentangled representation learning by a self-supervised classifier to eliminate coupled representations in a contrastive manner. To this end, a Contrastive Copula VAE (C$^2$VAE) is introduced without relying on prior knowledge about data in the probabilistic principle and involving strong modeling assumptions on the posterior in the neural architecture. C$^2$VAE simultaneously factorizes the posterior (evidence lower bound, ELBO) with total correlation (TC)-driven decomposition for learning factorized disentangled representations and extracts the dependencies between hidden features by a neural Gaussian copula for copula coupled representations. Then, a self-supervised contrastive classifier differentiates the disentangled representations from the coupled representations, where a contrastive loss regularizes this contrastive classification together with the TC loss for eliminating entangled factors and strengthening disentangled representations. C$^2$VAE demonstrates a strong effect in enhancing disentangled representation learning. C$^2$VAE further contributes to improved optimization addressing the TC-based VAE instability and the trade-off between reconstruction and representation.