Signal models based on sparsity, low-rank and other properties have been exploited for image reconstruction from limited and corrupted data in medical imaging and other computational imaging applications. In particular, sparsifying transform models have shown promise in various applications, and offer numerous advantages such as efficiencies in sparse coding and learning. This work investigates pre-learning a multi-layer extension of the transform model for image reconstruction, wherein the transform domain or filtering residuals of the image are further sparsified over the layers. The residuals from multiple layers are jointly minimized during learning, and in the regularizer for reconstruction. The proposed block coordinate descent optimization algorithms involve highly efficient updates. Preliminary numerical experiments demonstrate the usefulness of a two-layer model over the previous related schemes for CT image reconstruction from low-dose measurements.