Abstract:We introduce Open-Sora Plan, an open-source project that aims to contribute a large generation model for generating desired high-resolution videos with long durations based on various user inputs. Our project comprises multiple components for the entire video generation process, including a Wavelet-Flow Variational Autoencoder, a Joint Image-Video Skiparse Denoiser, and various condition controllers. Moreover, many assistant strategies for efficient training and inference are designed, and a multi-dimensional data curation pipeline is proposed for obtaining desired high-quality data. Benefiting from efficient thoughts, our Open-Sora Plan achieves impressive video generation results in both qualitative and quantitative evaluations. We hope our careful design and practical experience can inspire the video generation research community. All our codes and model weights are publicly available at \url{https://github.com/PKU-YuanGroup/Open-Sora-Plan}.
Abstract:Effectively representing medical concepts and patients is important for healthcare analytical applications. Representing medical concepts for healthcare analytical tasks requires incorporating medical domain knowledge and prior information from patient description data. Current methods, such as feature engineering and mapping medical concepts to standardized terminologies, have limitations in capturing the dynamic patterns from patient description data. Other embedding-based methods have difficulties in incorporating important medical domain knowledge and often require a large amount of training data, which may not be feasible for most healthcare systems. Our proposed framework, MD-Manifold, introduces a novel approach to medical concept and patient representation. It includes a new data augmentation approach, concept distance metric, and patient-patient network to incorporate crucial medical domain knowledge and prior data information. It then adapts manifold learning methods to generate medical concept-level representations that accurately reflect medical knowledge and patient-level representations that clearly identify heterogeneous patient cohorts. MD-Manifold also outperforms other state-of-the-art techniques in various downstream healthcare analytical tasks. Our work has significant implications in information systems research in representation learning, knowledge-driven machine learning, and using design science as middle-ground frameworks for downstream explorative and predictive analyses. Practically, MD-Manifold has the potential to create effective and generalizable representations of medical concepts and patients by incorporating medical domain knowledge and prior data information. It enables deeper insights into medical data and facilitates the development of new analytical applications for better healthcare outcomes.