In this work we present the modular Crowd Simulation Evaluation through Composition framework (CSEC) which provides a quantitative comparison between different pedestrian and crowd simulation approaches. Evaluation is made based on the comparison of source footage against synthetic video created through novel composition techniques. The proposed framework seeks to reduce the complexity of simulation evaluation and provide a platform from which the comparison of differing simulation algorithms as well as parametric tuning can be conducted to improve simulation accuracy or providing measures of similarity between crowd simulation algorithms and source data. Through the use of features designed to mimic the Human Visual System (HVS), specific simulation properties can be evaluated relative to sample footage. Validation was performed on a number of popular crowd datasets and through comparisons of multiple pedestrian and crowd simulation algorithms.