Predicting fungal infection sensitivity of sepals in harvested tomatoes using Hyperspectral Imaging and Partial Least Squares Discriminant Analysis

  • Mercedes Bertotto (Contributor)
  • de Villiers, H. (Contributor)
  • Chauhan, A. (Contributor)
  • Esther Hogeveen-van Echtelt (Contributor)
  • Mensink, M. (Contributor)
  • Zeljana Grobic (Contributor)
  • Dminitrije Stavonovic (Contributor)
  • Marko Panic (Contributor)
  • Sanja Brdan (Contributor)

Activity: Talk or presentationKeynote talkAcademic

Description


A new method was developed to classify sepal sensitivity to fungal infections of recently harvested tomatoes using spectral imaging and PLSDA. Previous work has been done by Brdar, S et al. (2021) and De Villiers, HAC et al. (2023), where the influence of variables were determined in the final model. However, in the present work an iterative process is used to select a sparse subset of important variables before their use by the final model. In this way, only a small subset of wavelengths needs to be measured in the unseen samples.
32 ‘Cappricia’ tomatoes without any visible indications of fungal infection were imaged in two separate equally sized groups. Hyperspectral images were recorded on day one using a Specim FX17 NIR linescan camera. Subsequently, tomatoes were stored in controlled conditions encouraging fungal growth (20°C, in a closed box reaching 100% Relative Humidity, in a room at 60% RH, lights on during 7:00-19:00h, 15 μmol·s-1·m-2).
Ground truth observations were made by experts on day three and four, comprised of severity scores from zero (no fungus) to three (severe infection). Ratings of the two days were averaged. Firstly, outliers were removed in each tomato, by PCA. The remaining pixels belonging to the same sepal were averaged giving rise to 167 rows of sepals.
Samples were distributed in two classes according to visual scoring. Class 1 (negative) included ratings of 0.5 or less. Class 2 (positive) included ratings of 1 or greater. The data set was then divided into calibration (70%) and validation (30%) sets, randomly, by tomato. Besides raw data, several preprocessing steps were performed (Figure 1). Models were built in the training set using 11 to 40 selected variables by CovSel. PLSDA latent variables were optimized as well, by cross-validation on each tomato. Figure 1 shows results of different models and the pretreatments used. In all of them the optimal number of variables was also optimized.
The important variables found in this work are (nm): 937, 944, 951, 971, 1089, 1152, 1306, 1356, 1391, 1440, 1540, 1675, 1704, 1711, 1718. The best results were obtained using raw data, the mentioned features, and 3 latent variables in PLSDA. The model presented high accuracy of validation, 0.80. Sensitivity and specificity were 0.62 and 0.91 respectively for class 1. Thus, the model presented potential as a fast alternative method to classify recently harvested tomatoes before the fungal infection is visually observed.

Period20 Aug 202324 Aug 2023
Event titleInternational Conference of Near Infrared Spectroscopy
Event typeConference/symposium
Degree of RecognitionInternational