With the rising prevalence of cardiovascular and respiratory disorders and an aging global population, healthcare systems face increasing pressure to adopt efficient, non-contact vital sign monitoring (NCVSM) solutions. This study introduces a robust framework for multi-person localization and vital signs monitoring, using multiple-input-multiple-output frequency-modulated continuous wave radar, addressing challenges in real-world, cluttered environments. Two key contributions are presented. First, a custom hardware phantom was developed to simulate multi-person NCVSM scenarios, utilizing recorded thoracic impedance signals to replicate realistic cardiopulmonary dynamics. The phantom's design facilitates repeatable and rapid validation of radar systems and algorithms under diverse conditions to accelerate deployment in human monitoring. Second, aided by the phantom, we designed a robust algorithm for multi-person localization utilizing joint sparsity and cardiopulmonary properties, alongside harmonics-resilient dictionary-based vital signs estimation, to mitigate interfering respiration harmonics. Additionally, an adaptive signal refinement procedure is introduced to enhance the accuracy of continuous NCVSM by leveraging the continuity of the estimates. Performance was validated and compared to existing techniques through 12 phantom trials and 12 human trials, including both single- and multi-person scenarios, demonstrating superior localization and NCVSM performance. For example, in multi-person human trials, our method achieved average respiration rate estimation accuracies of 94.14%, 98.12%, and 98.69% within error thresholds of 2, 3, and 4 breaths per minute, respectively, and heart rate accuracies of 87.10%, 94.12%, and 95.54% within the same thresholds. These results highlight the potential of this framework for reliable multi-person NCVSM in healthcare and IoT applications.