TY - JOUR
T1 - The automation of the development of classification models and improvement of model quality using feature engineering techniques
AU - Boeschoten, Sjoerd
AU - Catal, Cagatay
AU - Tekinerdogan, Bedir
AU - Lommen, Arjen
AU - Blokland, Marco
PY - 2023/3/1
Y1 - 2023/3/1
N2 - Recently pipelines of machine learning-based classification models have become important to codify, orchestrate, and automate the workflow to produce an effective machine learning model. In this article, we propose a framework that combines feature engineering techniques such as data imputation, transformation, and class balancing to compare the performance of different prediction models and select the best final model based on predefined parameters. The proposed framework is extendable and configurable by adding algorithms supported by the CARET package implemented in the R programming language. This framework can generate different machine learning models, which provide comparable results compared to other studies. The framework allows practitioners and researchers to automatically generate different classification models. This research used High-Resolution Orbitrap-based Mass Spectrometers (HRMS) data to create automated prediction models for the first time in literature. We demonstrated the applicability of feature engineering techniques such as data imputation, transformation (e.g., scaling, centering, etc.), and data balancing using several case studies and the proposed semi-automated framework. We showed how the initial prediction models can be improved using the proposed framework.
AB - Recently pipelines of machine learning-based classification models have become important to codify, orchestrate, and automate the workflow to produce an effective machine learning model. In this article, we propose a framework that combines feature engineering techniques such as data imputation, transformation, and class balancing to compare the performance of different prediction models and select the best final model based on predefined parameters. The proposed framework is extendable and configurable by adding algorithms supported by the CARET package implemented in the R programming language. This framework can generate different machine learning models, which provide comparable results compared to other studies. The framework allows practitioners and researchers to automatically generate different classification models. This research used High-Resolution Orbitrap-based Mass Spectrometers (HRMS) data to create automated prediction models for the first time in literature. We demonstrated the applicability of feature engineering techniques such as data imputation, transformation (e.g., scaling, centering, etc.), and data balancing using several case studies and the proposed semi-automated framework. We showed how the initial prediction models can be improved using the proposed framework.
KW - Automation
KW - Data balancing
KW - Data imputation
KW - Feature engineering
KW - Feature transformation
KW - Machine learning
KW - Machine learning pipeline
U2 - 10.1016/j.eswa.2022.118912
DO - 10.1016/j.eswa.2022.118912
M3 - Article
AN - SCOPUS:85139012625
SN - 0957-4174
VL - 213
JO - Expert Systems with Applications
JF - Expert Systems with Applications
M1 - 118912
ER -