Facial Expression Recognition (FER) plays an important role in human-computer interactions and is used in a wide range of applications. Convolutional Neural Networks (CNN) have shown promise in their ability to classify human facial expressions, however, large CNNs are not well-suited to be implemented on resource- and energy-constrained IoT devices. In this work, we present a hierarchical framework for developing and optimizing hardware-aware CNNs tuned for deployment at the edge. We perform a comprehensive analysis across various edge AI accelerators including NVIDIA Jetson Nano, Intel Neural Compute Stick, and Coral TPU. Using the proposed strategy, we achieved a peak accuracy of 99.49% when testing on the CK+ facial expression recognition dataset. Additionally, we achieved a minimum inference latency of 0.39 milliseconds and a minimum power consumption of 0.52 Watts.