Heavy goods vehicles are vital backbones of the supply chain delivery system but also contribute significantly to carbon emissions with only 60% loading efficiency in the United Kingdom. Collaborative vehicle routing has been proposed as a solution to increase efficiency, but challenges remain to make this a possibility. One key challenge is the efficient computation of viable solutions for co-loading and routing. Current operations research methods suffer from non-linear scaling with increasing problem size and are therefore bound to limited geographic areas to compute results in time for day-to-day operations. This only allows for local optima in routing and leaves global optimisation potential untouched. We develop a reinforcement learning model to solve the three-dimensional loading capacitated vehicle routing problem in approximately linear time. While this problem has been studied extensively in operations research, no publications on solving it with reinforcement learning exist. We demonstrate the favourable scaling of our reinforcement learning model and benchmark our routing performance against state-of-the-art methods. The model performs within an average gap of 3.83% to 8.10% compared to established methods. Our model not only represents a promising first step towards large-scale logistics optimisation with reinforcement learning but also lays the foundation for this research stream.