In this work we extend previous analyses of linguistic networks by adopting a multi-layer network framework for modelling the human mental lexicon, i.e. an abstract mental repository where words and concepts are stored together with their linguistic patterns. Across a three-layer linguistic multiplex, we model English words as nodes and connect them according to (i) phonological similarities, (ii) synonym relationships and (iii) free word associations. Our main aim is to exploit this multi-layered structure to explore the influence of phonological and semantic relationships on lexicon assembly over time. We propose a model of lexicon growth which is driven by the phonological layer: words are suggested according to different orderings of insertion (e.g. shorter word length, highest frequency, semantic multiplex features) and accepted or rejected subject to constraints. We then measure times of network assembly and compare these to empirical data about the age of acquisition of words. In agreement with empirical studies in psycholinguistics, our results provide quantitative evidence for the hypothesis that word acquisition is driven by features at multiple levels of organisation within language.