Snapshot compressive imaging (SCI) refers to the recovery of three-dimensional data cubes-such as videos or hyperspectral images-from their two-dimensional projections, which are generated by a special encoding of the data with a mask. SCI systems commonly use binary-valued masks that follow certain physical constraints. Optimizing these masks subject to these constraints is expected to improve system performance. However, prior theoretical work on SCI systems focuses solely on independently and identically distributed (i.i.d.) Gaussian masks, which do not permit such optimization. On the other hand, existing practical mask optimizations rely on computationally intensive joint optimizations that provide limited insight into the role of masks and are expected to be sub-optimal due to the non-convexity and complexity of the optimization. In this paper, we analytically characterize the performance of SCI systems employing binary masks and leverage our analysis to optimize hardware parameters. Our findings provide a comprehensive and fundamental understanding of the role of binary masks - with both independent and dependent elements - and their optimization. We also present simulation results that confirm our theoretical findings and further illuminate different aspects of mask design.