We analyse the effectiveness of RMSE, PSNR, SSIM and FOM for evaluating edge detection algorithms used for automated coastline detection. Typically, the accuracy of detected coastlines is assessed visually. This can be impractical on a large scale leading to the need for objective evaluation metrics. Hence, we conduct an experiment to find reliable metrics. We apply Canny edge detection to 95 coastline satellite images across 49 testing locations. We vary the Hysteresis thresholds and compare metric values to a visual analysis of detected edges. We found that FOM was the most reliable metric for selecting the best threshold. It could select a better threshold 92.6% of the time and the best threshold 66.3% of the time. This is compared RMSE, PSNR and SSIM which could select the best threshold 6.3%, 6.3% and 11.6% of the time respectively. We provide a reason for these results by reformulating RMSE, PSNR and SSIM in terms of confusion matrix measures. This suggests these metrics not only fail for this experiment but are not useful for evaluating edge detection in general.