In the Search for Optimal Multi-view Learning Models for Crop Classification with Global Remote Sensing Data

Studying and analyzing cropland is a difficult task due to its dynamic andheterogeneous growth behavior. Usually, diverse data sources can be collectedfor its estimation. Although deep learning models have proven to excel in thecrop classification task, they face substantial challenges when dealing withmultiple inputs, named Multi-View Learning (MVL). The methods used in the MVLscenario can be structured based on the encoder architecture, the fusionstrategy, and the optimization technique. The literature has primarily focusedon using specific encoder architectures for local regions, lacking a deeperexploration of other components in the MVL methodology. In contrast, weinvestigate the simultaneous selection of the fusion strategy and encoderarchitecture, assessing global-scale cropland and crop-type classifications. Weuse a range of five fusion strategies (Input, Feature, Decision, Ensemble,Hybrid) and five temporal encoders (LSTM, GRU, TempCNN, TAE, L-TAE) as possibleconfigurations in the MVL method. We use the CropHarvest dataset forvalidation, which provides optical, radar, weather time series, and topographicinformation as input data. We found that in scenarios with a limited number oflabeled samples, a unique configuration is insufficient for all the cases.Instead, a specialized combination should be meticulously sought, including anencoder and fusion strategy. To streamline this search process, we suggestidentifying the optimal encoder architecture tailored for a particular fusionstrategy, and then determining the most suitable fusion strategy for theclassification task. We provide a methodological framework for researchersexploring crop classification through an MVL methodology.