Despite the growing popularity of deep learning technologies, high memory requirements and power consumption are essentially limiting their application in mobile and IoT areas. While binary convolutional networks can alleviate these problems, the limited bitwidth of weights is often leading to significant degradation of prediction accuracy. In this paper, we present a method for training binary networks that maintains a stable predefined level of their information capacity throughout the training process by applying Shannon entropy based penalty to convolutional filters. The results of experiments conducted on SVHN, CIFAR and ImageNet datasets demonstrate that the proposed approach can statistically significantly improve the accuracy of binary networks.