We present a novel control strategy for a team of unmanned aerial vehicles (UAVs) to autonomously achieve a desired formation using only visual feedback provided by the UAV's onboard cameras. This effectively eliminates the need for global position measurements. The proposed pipeline is fully distributed and encompasses a collision avoidance scheme. In our approach, each UAV extracts feature points from captured images and communicates their pixel coordinates and descriptors among its neighbors. These feature points are used in our novel pose estimation algorithm, QuEst, to localize the neighboring UAVs. Compared to existing methods, QuEst has better estimation accuracy and is robust to feature point degeneracies. We demonstrate the proposed pipeline in a high-fidelity simulation environment and show that UAVs can achieve a desired formation in a natural environment without any fiducial markers.