A harmonized Landsat Sentinel-2 (HLS) dataset for benchmarking time series reconstruction methods of vegetation indices

  • Davide Consoli (Creator)
  • Leandro Leal Parente (Creator)
  • Martijn Witjes (Creator)
  • Tomislav Hengl (Creator)



Satellite images can be used to derive time series of vegetation indices, such as normalized difference vegetation index (NDVI) or enhanced vegetation index (EVI), at global scale. Unfortunately, recording artifacts, clouds, and other atmospheric contaminants impacts a significant portion of the produced images, requiring the usage of ad-hoc techniques to reconstruct the time series in the affected regions. In literature, several methods have been proposed to fill the gaps present in the images, and some works also presented performance comparisons between them (Roerink et al., 2000; Moreno-Martínez et al., 2020; Siabi et al., 2022). Because of the lack of a ground truth for the reconstructed images, the performance evaluation requires the creation of datasets where artificial gaps are introduced in a reference image, such that metrics like the root mean square error (RMSE) can be computed comparing the reconstructed images with the reference one. Different approaches have been used to create the reference images and the artificial gaps, but in most cases, the artificial gaps are introduced using arbitrary patterns and/or the reference image is produced artificially and not using real satellite images (e.g. Kandasamy et al., 2013; Liu et al., 2017; Julien &amp; Sobrino, 2018). In addition, to the best of our knowledge, few of them are openly available and directly accessible allowing for fully reproducible research. We provide here a benchmark dataset for time series reconstruction method based on the<strong> harmonized Landsat Sentinel-2 (HLS) </strong>collection where the artificial gaps are introduced with a realistic spatio-temporal distribution. In particular, we selected six tiles that we considered representative for most of the main climate classes (e.g. equatorial, arid, warm temperature, boreal and polar), as depicted in the preview. Specifically, following the <strong>relative tiling system</strong> shown above, we downloaded the Red, NIR and F-mask bands from both the HLSL30 and HLSS30 collections for the tiles 19FCV, 22LEH, 32QPK, 31UFS, 45WFV and 49MWM. From the Red and NIR band we derived the NDVI as: \(NDVI = {NIR - Red \over NIR + Red}\) only for clear-sky on lend pixels (F-mask bits 1, 3, 4 and 5 equal zero), setting as not a number the remaining pixels. The images are then aggregated on a 16 days base, averaging the available values for each pixel in each temporal range. The so obtained data, are considered from us as the reference data for the benchmarking, and stored following the file naming convention <em>HLS.T&lt;TILE_NAME&gt;.&lt;YYYYDDD&gt;.v2.0.NDVI.tif</em> where <em>TILE_NAME</em> is one between the above specified ones, <em>YYYY</em> is the corresponding year (spanning from 2015 to 2022) and <em>DDD</em> is the day of the year from which the corresponding 16 days range starts. Finally, for each tile, we have a time series composed of <strong>184</strong> images (23 images for 8 years) that can be easily manipulated, for example using the <strong>Scikit-Map library</strong> in Python. Starting from those data, for each image we considered the mask of currently present gaps, we randomly rotated it by 90, 180 or 270 degrees and we added artificial gaps in the pixels of the rotated mask. Doing so, we believe that the spatio-temporal distribution will be still realistic, providing a solid benchmark for gap-filling methods that work on time series, on spatial pattern or combination of the both. The data including the artificial gaps are stored with the naming structure <em>HLS.T&lt;TILE_NAME&gt;.&lt;YYYYDDD&gt;.v2.0.NDVI_art_gaps.tif</em> following the previously mentioned convention. The performance metrics, such as RMSE or normalized RMSE (NRMSE), can be computed by applying a reconstruction method on the images with artificial gaps, and then comparing the reconstructed time series with the reference one only on the artificially created gaps locations. This dataset was used to compare the performance of some gap-filling methods and we provide a <strong>Jupyter notebook</strong> that shows how to access and use the data. The files are provided in GeoTIFF format and projected in the coordinate reference system WGS 84 / UTM zone 19N (EPSG:32619). If you succeed to produce higher accuracy or develop a new algorithm for gap filling, please contact authors or post on our GitHub repository. May the force be with you! References: Julien, Y., &amp; Sobrino, J. A. (2018). TISSBERT: A benchmark for the validation and comparison of NDVI time series reconstruction methods. Revista de Teledetección, (51), 19-31. https://doi.org/10.4995/raet.2018.9749 Kandasamy, S., Baret, F., Verger, A., Neveux, P., &amp; Weiss, M. (2013). A comparison of methods for smoothing and gap filling time series of remote sensing observations–application to MODIS LAI products. Biogeosciences, 10(6), 4055-4071. https://doi.org/10.5194/bg-10-4055-2013 Liu, R., Shang, R., Liu, Y., &amp; Lu, X. (2017). Global evaluation of gap-filling approaches for seasonal NDVI with considering vegetation growth trajectory, protection of key point, noise resistance and curve stability. Remote Sensing of Environment, 189, 164-179. https://doi.org/10.1016/j.rse.2016.11.023 Moreno-Martínez, Á., Izquierdo-Verdiguier, E., Maneta, M. P., Camps-Valls, G., Robinson, N., Muñoz-Marí, J., ... &amp; Running, S. W. (2020). Multispectral high resolution sensor fusion for smoothing and gap-filling in the cloud. Remote Sensing of Environment, 247, 111901. https://doi.org/10.1016/j.rse.2020.111901 Roerink, G. J., Menenti, M., &amp; Verhoef, W. (2000). Reconstructing cloudfree NDVI composites using Fourier analysis of time series. International Journal of Remote Sensing, 21(9), 1911-1917. https://doi.org/10.1080/014311600209814 Siabi, N., Sanaeinejad, S. H., &amp; Ghahraman, B. (2022). Effective method for filling gaps in time series of environmental remote sensing data: An example on evapotranspiration and land surface temperature images. Computers and Electronics in Agriculture, 193, 106619. https://doi.org/10.1016/j.compag.2021.106619
Date made available6 Jul 2023


  • Open Data
  • Time series reconstruction
  • Harmonized Landsat Sentinel-2
  • HLS
  • Benchmarking
  • NDVI

Cite this