This dataset includes sample data for the United States to run the weakly supervised framework as described in the paper titled <em>A weakly supervised framework for high resolution crop yield forecasts</em>, accessible at https://doi.org/10.48550/arXiv.2205.09016
The updated paper (including results from the US) is under review in Environmental Research Letters.
The software implementation of the machine learning baseline is available at: https://github.com/BigDataWUR/MLforCropYieldForecasting/tree/weaksup.
Data
1. County data (county-data.zip) for county-level strongly supervised models:
* CROP_AREA_COUNTY_US.csv: County crop production area statistics (acres). Source: NASS (USDA-NASS, 2022).
* CSSF_COUNTY_US.csv: Crop productivity indicators including total above-ground production (kg ha<sup>-1</sup>), total weight of storage organs (kg ha<sup>-1</sup>), development stage (0-2). Source: de Wit et al. (2022).
* METEO_COUNTY_US.csv: Meteo data including maximum, minimum, average daily air temperature (℃); sum of daily precipitation (PREC) (mm); sum of daily evapotranspiration of short vegetation (ET0) (Penman-Monteith, Allen et al., (1998)) (mm); climate water balance = (PREC - ET0) (mm). Source: Boogaard et al. (2022).
* REMOTE_SENSING_COUNTY_US.csv: Fraction of Absorbed Photosynthetically Active Radiation (Smoothed) (FAPAR). Source: Copernicus GLS (2020).
* SOIL_COUNTY_US.csv: Soil water holding capacity. Source: WISE Soil Property Database (Batjes, 2016).
* YIELD_COUNTY_US.csv: County yield statistics (bushels/acre). Source: NASS (USDA-NASS, 2022). 2. 10-km grid data (grid-data.zip) for grid-level strongly supervised models:
* COUNTY_GRIDS_US.csv: Mapping between counties and grids. * CSSF_GRIDS_US.csv: Crop productivity indicators at 10km grid level (similar to county data above).
* METEO_GRIDs_US.csv: Meteo data at 10km grid level (similar to county data above).
* REMOTE_SENSING_GRIDS_US.csv: FAPAR at 10km grid level (similar to county data above).
* SOIL_GRIDS_US.csv: Soil water holding capacity at 10km grid level (similar to county data above).
* YIELD_GRIDS_US.csv: Grid-level modeled yields (t ha<sup>-1</sup>). Source: Deines et al. (2021). 3. County labels and 10-km grid inputs (dscale-US.zip) for weak supervision:
* COUNTY_GRIDS_US.csv: Mapping between counties and grids.
* CSSF_GRIDS_US.csv: Crop productivity indicators at 10km grid level.
* METEO_GRIDs_US.csv: Meteo indicators at 10km grid level.
* REMOTE_SENSING_GRIDS_US.csv: FAPAR at 10km grid level.
* SOIL_GRIDS_US.csv: Soil water holding capacity at 10km grid level.
* YIELD_GRIDS_US.csv: Grid-level modeled yields (t ha<sup>-1</sup>). Source: Deines et al. (2021).
* YIELD_COUNTY_US.csv: County yield statistics (bushels/acre). Source: NASS (USDA-NASS, 2022).
* CROP_AREA_COUNTY_US.csv: County crop production area statistics (acres). Source: NASS (USDA-NASS, 2022).
- crop yield; deep learning; weak supervision; disaggregation; spatial variability