Our study presents a framework for predicting image-based social media content popularity that focuses on addressing complex image information and a hierarchical data structure. We utilize the Google Cloud Vision API to effectively extract key image and color information from users' postings, achieving 6.8% higher accuracy compared to using non-image covariates alone. For prediction, we explore a wide range of prediction models, including Linear Mixed Model, Support Vector Regression, Multi-layer Perceptron, Random Forest, and XGBoost, with linear regression as the benchmark. Our comparative study demonstrates that models that are capable of capturing the underlying nonlinear interactions between covariates outperform other methods.