Crop classification is of critical importance due to its role in studying crop pattern changes, resource management, and carbon sequestration. When employing data-driven techniques for its prediction, utilizing various temporal data sources is necessary. Deep learning models have proven to be effective for this task by mapping time series data to high-level representation for prediction. However, they face substantial challenges when dealing with multiple input patterns. The literature offers limited guidance for Multi-View Learning (MVL) scenarios, as it has primarily focused on exploring fusion strategies with specific encoders and validating them in local regions. In contrast, we investigate the impact of simultaneous selection of the fusion strategy and the encoder architecture evaluated on a global-scale cropland and crop-type classifications. We use a range of five fusion strategies (Input, Feature, Decision, Ensemble, Hybrid) and five temporal encoder architectures (LSTM, GRU, TempCNN, TAE, L-TAE) as possible MVL model configurations. The validation is on the CropHarvest dataset that provides optical, radar, and weather time series, and topographic information as input data. We found that in scenarios with a limited number of labeled samples, a unique configuration is insufficient for all the cases. Instead, a specialized combination, including encoder and fusion strategy, should be meticulously sought. To streamline this search process, we suggest initially identifying the optimal encoder architecture tailored for a particular fusion strategy, and then determining the most suitable fusion strategy for the classification task. We provide a technical framework for researchers exploring crop classification or related tasks through a MVL approach.