Facing the sparsity of user attributes on social networks, attribute inference aims at inferring missing attributes based on existing data and additional information such as social connections between users. Recently, Variational Autoencoders (VAEs) have been successfully applied to solve the problem in a semi-supervised way. However, the latent representations learned by the encoder contain either insufficient or useless information: i) MLPs can successfully reconstruct the input data but fail in completing missing part, ii) GNNs merge information according to social connections but suffer from over-smoothing, which is a common problem with GNNs. Moreover, existing methods neglect regulating the decoder, as a result, it lacks adequate inference ability and faces severe overfitting. To address the above issues, we propose an attribute inference model based on adversarial VAE (Infer-AVAE). Our model deliberately unifies MLPs and GNNs in encoder to learn dual latent representations: one contains only the observed attributes of each user, the other converges extra information from the neighborhood. Then, an adversarial network is trained to leverage the differences between the two representations and adversarial training is conducted to guide GNNs using MLPs for robust representations. What's more, mutual information constraint is introduced in loss function to specifically train the decoder as a discriminator. Thus, it can make better use of auxiliary information in the representations for attribute inference. Based on real-world social network datasets, experimental results demonstrate that our model averagely outperforms state-of-art by 7.0% in accuracy.