Deploying Convolutional Neural Networks (CNNs) on edge platforms necessitates efficient hardware acceleration. Any unnecessary data movement in such accelerators can unacceptably degrade performance and efficiency. To address this, we develop a layer fusion technique targeting CNNs, that reduces off-chip data communication using a Genetic Algorithm (GA) applied to graph-based topological sort. Results show a 1.8$\times$ increase in energy efficiency and 1.9$\times$ improvement in energy-delay product (EDP) for MobileNet-v3 on a SIMBA-like mobile architecture. Our approach consistently improves workload performance, averaging 1.4$\times$ improvement to EDP for SIMBA and 1.12$\times$ for Eyeriss.