Abstract:We present a domain adaptation of video diffusion models to generate highly realistic time-lapse microscopy videos of cell division in HeLa cells. Although state-of-the-art generative video models have advanced significantly for natural videos, they remain underexplored in microscopy domains. To address this gap, we fine-tune a pretrained video diffusion model on microscopy-specific sequences, exploring three conditioning strategies: (1) text prompts derived from numeric phenotypic measurements (e.g., proliferation rates, migration speeds, cell-death frequencies), (2) direct numeric embeddings of phenotype scores, and (3) image-conditioned generation, where an initial microscopy frame is extended into a complete video sequence. Evaluation using biologically meaningful morphological, proliferation, and migration metrics demonstrates that fine-tuning substantially improves realism and accurately captures critical cellular behaviors such as mitosis and migration. Notably, the fine-tuned model also generalizes beyond the training horizon, generating coherent cell dynamics even in extended sequences. However, precisely controlling specific phenotypic characteristics remains challenging, highlighting opportunities for future work to enhance conditioning methods. Our results demonstrate the potential for domain-specific fine-tuning of generative video models to produce biologically plausible synthetic microscopy data, supporting applications such as in-silico hypothesis testing and data augmentation.