Abstract:Crowd management technologies that leverage computer vision are widespread in contemporary times. There exists many security-related applications of these methods, including, but not limited to: following the flow of an array of people and monitoring large gatherings. In this paper, we propose an accurate monitoring system composed of two concatenated convolutional deep learning architectures. The first part called Front-end, is responsible for converting bi-dimensional signals and delivering high-level features. The second part, called the Back-end, is a dilated Convolutional Neural Network (CNN) used to replace pooling layers. It is responsible for enlarging the receptive field of the whole network and converting the descriptors provided by the first network to a saliency map that will be utilized to estimate the number of people in highly congested images. We also propose to utilize a genetic algorithm in order to find an optimized dilation rate configuration in the back-end. The proposed model is shown to converge 30\% faster than state-of-the-art approaches. It is also shown that it achieves 20\% lower Mean Absolute Error (MAE) when applied to the Shanghai data~set.