This paper addresses a distributed leader-follower formation control problem for a group of agents, each using a body-fixed camera with a limited field of view (FOV) for state estimation. The main challenge arises from the need to coordinate the agents' movements with their cameras' FOV to maintain visibility of the leader for accurate and reliable state estimation. To address this challenge, we propose a novel perception-aware distributed leader-follower safe control scheme that incorporates FOV limits as state constraints. A Control Barrier Function (CBF) based quadratic program is employed to ensure the forward invariance of a safety set defined by these constraints. Furthermore, new neural network based and double bounding boxes based estimators, combined with temporal filters, are developed to estimate system states directly from real-time image data, providing consistent performance across various environments. Comparison results in the Gazebo simulator demonstrate the effectiveness and robustness of the proposed framework in two distinct environments.