Despite significant improvements over the last few years, cloud-based healthcare applications continue to suffer from poor adoption due to their limitations in meeting stringent security, privacy, and quality of service requirements (such as low latency). The edge computing trend, along with techniques for distributed machine learning such as federated learning, have gained popularity as a viable solution in such settings. In this paper, we leverage the capabilities of edge computing in medicine by analyzing and evaluating the potential of intelligent processing of clinical visual data at the edge allowing the remote healthcare centers, lacking advanced diagnostic facilities, to benefit from the multi-modal data securely. To this aim, we utilize the emerging concept of clustered federated learning (CFL) for an automatic diagnosis of COVID-19. Such an automated system can help reduce the burden on healthcare systems across the world that has been under a lot of stress since the COVID-19 pandemic emerged in late 2019. We evaluate the performance of the proposed framework under different experimental setups on two benchmark datasets. Promising results are obtained on both datasets resulting in comparable results against the central baseline where the specialized models (i.e., each on a specific type of COVID-19 imagery) are trained with central data, and improvements of 16\% and 11\% in overall F1-Scores have been achieved over the multi-modal model trained in the conventional Federated Learning setup on X-ray and Ultrasound datasets, respectively. We also discuss in detail the associated challenges, technologies, tools, and techniques available for deploying ML at the edge in such privacy and delay-sensitive applications.