Abstract:The objective of this paper is to explore how financial big data and machine learning methods can be applied to model and understand complex financial products. We focus on residential mortgage backed securities, resMBS, that were at the heart of the 2008 US financial crisis. The securities are contained within a prospectus and have a complex payoff structure. Multiple financial institutions form a supply chain to create the prospectuses. We provide insight into the performance of the resMBS securities through a series of increasingly complex models. First, models at the security level directly identify salient features of resMBS securities that impact their performance. Second, we extend the model to include prospectus level features. We are the first to demonstrate that the composition of the prospectus is associated with the performance of securities. Finally, to develop a deeper understanding of the role of the supply chain, we use unsupervised probabilistic methods, in particular, dynamic topics models (DTM), to understand community formation and temporal evolution along the chain. A comprehensive model provides insight into the impact of DTM communities on the issuance and evolution of prospectuses, and eventually the performance of resMBS securities.