As intelligent reflecting surface (IRS) has emerged as a new and promising technology capable of configuring the wireless environment favorably, channel estimation for IRS-assisted multiple-input multiple-output (MIMO) systems has garnered extensive attention in recent years. While various algorithms have been proposed to address this challenge, there is a lack of rigorous theoretical error analysis. This paper aims to address this gap by providing theoretical guarantees in terms of stable recovery of channel matrices for noisy measurements. We begin by establishing the equivalence between IRS-assisted MIMO systems and a compact tensor train (TT)-based tensor-on-tensor (ToT) regression. Building on this equivalence, we then investigate the restricted isometry property (RIP) for complex-valued subgaussian measurements. Our analysis reveals that successful recovery hinges on the relationship between the number of user terminals (in the uplink scenario) or base stations (in the downlink scenario) and the number of time slots during which channel matrices remain invariant. Utilizing the RIP condition, we analyze the theoretical recovery error for the solution to a constrained least-squares optimization problem, including upper error bound and minimax lower bound, demonstrating that the error decreases inversely with the number of time slots and increases proportionally with the number of unknown elements in the channel matrices. In addition, we extend our error analysis to two more specialized IRS-assisted MIMO systems, incorporating low-rank channel matrices or an unknown IRS. Furthermore, we explore a multi-hop IRS scheme and analyze the corresponding recovery errors. Finally, we introduce and implement two nonconvex optimization algorithms--alternating least squares and alternating gradient descent--to validate our conclusions through simulations.