When designing swarm-robotic systems, systematic comparison of algorithms from different domains is necessary to determine which is capable of scaling up to handle the target problem size and target operating conditions. We propose a set of quantitative metrics for scalability, flexibility, and emergence which are capable of addressing these needs during the system design process. We demonstrate the applicability of our proposed metrics as a design tool by solving a large object gathering problem in temporally varying operating conditions using iterative hypothesis evaluation. We provide experimental results obtained in simulation for swarms of over 10,000 robots.