Building generic, easily-updatable chemometric models with harmonisation and augmentation features: The case of FTIR vegetable oils classification

Georgouli, K., Diaz-Chito, K., Martinez-del Rincon, J., Koidis, A.
Abstract:
Published literature in food authenticity studies is based on multivariate chemometric models that have been calibrated under controlled conditions using a limited dataset and a particular spectral acquisition instrument. There is a challenge to create accurate and robust chemometric models that would be able to perform well when tested with samples that have never been encountered by the calibration data and be applicable when the acquisition instrument is different from the initial instrument. Augmentation of the models with synthetic samples is a fresh approach to overcome these challenges. But even when a chemometric model is modified with the synthetic samples there is always the danger of overfitting it to the calibration set especially because limited new chemical information is added to the model using this technique. The only solution in this case is often the acquisition of more spectra from original authentic samples and retrain the models. The problem starts when original data are not readily available. In all these situations, it is clear that evolving a chemometric model may be a better solution than recreating or retraining it as a full new batch. This will only require access to the existing models and the new samples. In this paper we propose, therefore, two different approaches to tackle the challenges described earlier: a) a novel spectral data augmentation framework (DAF) in order to increase the performance of a typical classification model by generating realistic data augmented samples and b) a simple model updating framework for retraining models from large datasets.
The feasibility of the proposed DAF has been evaluated on three main different experiments where Fourier transform mid infrared (FT-IR) spectroscopic data of vegetable oils were used for the identification of vegetable oil species in oil admixtures.
Results demonstrate a significant ~40% improvement in classification when testing in more than 10 different spectroscopic instruments to the calibration one. On the other hand, the application of our novel model updating technique, called Incremental Generalized Discriminative Common Vectors (IGDCV) based on the same vegetable oil identification scenario, allowed for faster model creation while maintaining the same high accuracy. It is argued that using the combined approach of the DAF and IGDCV techniques can allow the generation of models that are applicable to the real world such as a spectroscopy sensor (NIR or Raman) in the food production floor, a tea processing facility or other examples.
Keywords:
data augmentation, incremental model learning, classification, vegetable oils, spectroscopy
Download:
IMEKO-TC23-2017-019.pdf
DOI:
-
Event details
IMEKO TC:
TC23
Event name:
3rd IMEKOFOODS Conference
Title:

Metrology Promoting Standardization and Harmonization in Food and Nutrition

Place:
Thessaloniki, GREECE
Time:
01 October 2017 - 04 October 2017