Abstract:The advent of cost effective cloud computing over the past decade and ever-growing accumulation of high-fidelity clinical data in a modern hospital setting is leading to new opportunities for translational medicine. Machine learning is driving the appetite of the research community for various types of signal data such as patient vitals. Health care systems, however, are ill suited for massive processing of large volumes of data. In addition, due to the sheer magnitude of the data being collected, it is not feasible to retain all of the data in health care systems in perpetuity. This gold mine of information gets purged periodically thereby losing invaluable future research opportunities. We have developed a highly scalable solution that: a) siphons off patient vital data on a nightly basis from on-premises bio-medical systems to a cloud storage location as a permanent archive, b) reconstructs the database in the cloud, c) generates waveforms, alarms and numeric data in a research-ready format, and d) uploads the processed data to a storage location in the cloud ready for research. The data is de-identified and catalogued such that it can be joined with Electronic Medical Records (EMR) and other ancillary data types such as electroencephalogram (EEG), radiology, video monitoring etc. This technique eliminates the research burden from health care systems. This highly scalable solution is used to process high density patient monitoring data aggregated by the Philips Patient Information Center iX (PIC iX) hospital surveillance system for archival storage in the Philips Data Warehouse Connect enterprise-level database. The solution is part of a broader platform that supports a secure high performance clinical data science platform.