Deep learning (DL) is appearing as a powerful tool for spectral data modelling. In many cases, the DL models have been shown to outperform the classical chemometric spectral data modelling approaches. However, a main challenge with the DL models is the limited model interpretability as there are no scores and loadings generated in DL models. Scores and loadings are key parts of chemometric modelling as they facilitate the interpretation of the models. Furthermore, most of the time in the reported literature, the performance of DL models is compared with basic chemometric approaches with less attention being paid to optimizing these chemometric models. This study aims to test the hypothesis that proper chemometric modelling of spectral data can lead to performance equivalent to that of DL models while having all the useful information such as scores, loadings, and regression coefficients to support model interpretation. To test this, a case study is presented for the prediction of nitrogen content in rapeseed (Brassica napus L.) by Vis-NIR spectroscopy. On the classical chemometric side, two recently developed pre-processing fusion approaches, i.e. sequential pre-processing through orthogonalization (SPORT) and parallel pre-processing through orthogonalization (PORTO) were used. On the DL side, the previously published with DL modelling results for the same data set were used as the benchmark. Such a comparison was valid as the chemometric analysis was performed on the same calibration and test sets as used previously for the DL modelling. Results showed that the sequential and parallel learning approaches attained the same accuracy as that of the previously reported DL procedure on the same data set. The information related to scores, loading and regression coefficients could be used for model interpretation.
- Ensemble learning
- Pre-processing fusion