Imaging sites around the world generate growing amounts of medical scan data with ever more versatile and affordable technology. Large-scale studies acquire MRI for tens of thousands of participants, together with metadata ranging from lifestyle questionnaires to biochemical assays, genetic analyses and more. These large datasets encode substantial information about human health and hold considerable potential for machine learning training and analysis. This chapter examines ongoing large-scale studies and the challenge of distribution shifts between them. Transfer learning for overcoming such shifts is discussed, together with federated learning for safe access to distributed training data securely held at multiple institutions. Finally, representation learning is reviewed as a methodology for encoding embeddings that express abstract relationships in multi-modal input formats.