Universal unitary photonic devices are capable of applying arbitrary unitary transformations to multi-port coherent light inputs and provide a promising hardware platform for fast and energy-efficient machine learning. We address the problem of training universal photonic devices composed of meshes of tunable beamsplitters to learn unknown unitary matrices. The locally-interacting nature of the mesh components limits the fidelity of the learned matrices if phase shifts are randomly initialized. We propose an initialization procedure derived from the Haar measure over unitary matrices that overcomes this limitation. We also embed various model architectures within a standard rectangular mesh "canvas," and our numerical experiments show significantly improved scalability and training speed, even in the presence of fabrication errors.