In the realm of Earth science, effective cloud property retrieval, encompassing cloud masking, cloud phase classification, and cloud optical thickness (COT) prediction, remains pivotal. Traditional methodologies necessitate distinct models for each sensor instrument due to their unique spectral characteristics. Recent strides in Earth Science research have embraced machine learning and deep learning techniques to extract features from satellite datasets' spectral observations. However, prevailing approaches lack novel architectures accounting for hierarchical relationships among retrieval tasks. Moreover, considering the spectral diversity among existing sensors, the development of models with robust generalization capabilities over different sensor datasets is imperative. Surprisingly, there is a dearth of methodologies addressing the selection of an optimal model for diverse datasets. In response, this paper introduces MT-HCCAR, an end-to-end deep learning model employing multi-task learning to simultaneously tackle cloud masking, cloud phase retrieval (classification tasks), and COT prediction (a regression task). The MT-HCCAR integrates a hierarchical classification network (HC) and a classification-assisted attention-based regression network (CAR), enhancing precision and robustness in cloud labeling and COT prediction. Additionally, a comprehensive model selection method rooted in K-fold cross-validation, one standard error rule, and two introduced performance scores is proposed to select the optimal model over three simulated satellite datasets OCI, VIIRS, and ABI. The experiments comparing MT-HCCAR with baseline methods, the ablation studies, and the model selection affirm the superiority and the generalization capabilities of MT-HCCAR.