Calcium imaging has become a powerful alternative to electrophysiology for studying neuronal activity, offering spatial resolution and the ability to measure large populations of neurons in a minimally invasive manner. This technique has broad applications in neuroscience, neuroengineering, and medicine, enabling researchers to explore the relationship between neuron location and activity. Recent advancements in deep generative models (DGMs) have facilitated the modeling of neuronal population dynamics, uncovering latent representations that provide insights into behavior prediction and neuronal variance. However, these models often rely on spike inference algorithms and primarily focus on population-level dynamics, limiting their applicability for single-neuron analyses. To address this gap, we propose a novel framework for single-neuron representation learning using autoregressive variational autoencoders (AVAEs). Our approach embeds individual neurons' spatiotemporal signals into a reduced-dimensional space without the need for spike inference algorithms. The AVAE excels over traditional linear methods by generating more informative and discriminative latent representations, improving tasks such as visualization, clustering, and the understanding of neuronal activity. Additionally, the reconstruction performance of the AVAE outperforms the state of the art, demonstrating its ability to accurately recover the original fluorescence signal from the learned representation. Using realistic simulations, we show that our model captures underlying physical properties and connectivity patterns, enabling it to distinguish between different firing and connectivity types. These findings position the AVAE as a versatile and powerful tool for advancing single-neuron analysis and lays the groundwork for future integration of multimodal single-cell datasets in neuroscience.