Visual sensor networks constitute a fundamental class of distributed sensing systems, with unique complexity and performance research subjects. One of these novel challenges is represented by the identification of the network stimulation model (SM), which emerges when a set of detectable events trigger different subsets of the cameras. In this direction, the formulation of the related SM identification problem is proposed, along with a proper network observations generation method. Consequently, an approach based on deep embedded features and soft clustering is leveraged to solve the presented identification problem. In detail, the Gaussian Mixture Modeling is employed to provide a suitable description for data distribution and an autoencoder is used to reduce undesired effects due to the so-called curse of dimensionality. Hence, it is shown that a SM can be learnt by solving Maximum A-Posteriori estimation on the encoded features belonging to a space with lower dimensionality. Lastly, numerical results are reported to validate the devised estimation algorithm.