Federated Learning (FL) is a recent approach for collaboratively training Machine Learning models on mobile edge devices, without private user data leaving the devices. The popular FL algorithm, Federated Averaging (FedAvg), suffers from poor convergence speed given non-iid user data. Furthermore, most existing work on FedAvg measures central-model accuracy, but in many cases, such as user content-recommendation, improving individual User model Accuracy (UA) is the real objective. To address these issues, we propose a Multi-Task Federated Learning (MTFL) system, which converges faster than FedAvg by using distributed Adam optimization (FedAdam), and benefits UA by introducing personal, non-federated 'patch' Batch-Normalization (BN) layers into the model. Testing FedAdam on the MNIST and CIFAR10 datasets show that it converges faster (up to 5x) than FedAvg in non-iid scenarios, and experiments using MTFL on the CIFAR10 dataset show that MTFL significantly improves average UA over FedAvg, by up to 54%. We also analyse the affect that private BN patches have on the MTFL model during inference, and give evidence that MTFL strikes a better balance between regularization and convergence in FL. Finally, we test the MTFL system on a mobile edge computing testbed, showing that MTFL's convergence and UA benefits outweigh its overhead.