In this paper, we consider a tunable liquid convex lens-assisted imaging receiver for indoor multiple-input multiple-output (MIMO) visible light communication (VLC) systems. In contrast to existing MIMO VLC receivers that rely on fixed optical lenses, the proposed receiver leverages the additional degrees of freedom offered by liquid lenses via adjusting both focal length and orientation angles of the lens. This capability facilitates the mitigation of spatial correlation between the channel gains, thereby enhancing the overall signal quality and leading to improved bit-error rate (BER) performance. We present an accurate channel model for the liquid lens-assisted VLC system by using three-dimensional geometry and geometric optics. To achieve optimal performance under practical conditions such as random receiver orientation and user mobility, optimization of both focal length and orientation angles of the lens are required. To this end, driven by the fact that channel models are mathematically complex, we present two optimization schemes including a blockwise machine learning (ML) architecture that includes convolution layers to extract spatial features from the received signal, long-short term memory layers to predict the user position and orientation, and fully connected layers to estimate the optimal lens parameters. Numerical results are presented to compare the performance of each scheme with conventional receivers. Results show that a significant BER improvement is achieved when liquid lenses and presented ML-based optimization approaches are used. Specifically, the BER can be improved from $6\times 10^{-2}$ to $1.4\times 10^{-3}$ at an average signal-to-noise ratio of $30$ dB.