In this article we propose feature graph architectures (FGA), which are deep learning systems employing a structured initialisation and training method based on a feature graph which facilitates improved generalisation performance compared with a standard shallow architecture. The goal is to explore alternative perspectives on the problem of deep network training. We evaluate FGA performance for deep SVMs on some experimental datasets, and show how generalisation and stability results may be derived for these models. We describe the effect of permutations on the model accuracy, and give a criterion for the optimal permutation in terms of feature correlations. The experimental results show that the algorithm produces robust and significant test set improvements over a standard shallow SVM training method for a range of datasets. These gains are achieved with a moderate increase in time complexity.