Prioritizing candidate eQTL causal genes in Arabidopsis using RANDOM FORESTS

Margi Hartanto*, Asif Ahmed Sami, Dick de Ridder, Harm Nijveen*

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

Expression quantitative trait locus mapping has been widely used to study the genetic regulation of gene expression in Arabidopsis thaliana. As a result, a large amount of expression quantitative trait locus data has been generated for this model plant; however, only a few causal expression quantitative trait locus genes have been identified, and experimental validation is costly and laborious. A prioritization method could help speed up the identification of causal expression quantitative trait locus genes. This study extends the machine-learning-based QTG-Finder2 method for prioritizing candidate causal genes in phenotype quantitative trait loci to be used for expression quantitative trait loci by adding gene structure, protein interaction, and gene expression. Independent validation shows that the new algorithm can prioritize 16 out of 25 potential expression quantitative trait locus causal genes within the top 20% rank. Several new features are important in prioritizing causal expression quantitative trait locus genes, including the number of protein-protein interactions, unique domains, and introns. Overall, this study provides a foundation for developing computational methods to prioritize candidate expression quantitative trait locus causal genes. The prediction of all genes is available in the AraQTL workbench (https://www.bioinformatics.nl/AraQTL/) to support the identification of gene expression regulators in Arabidopsis.

Original languageEnglish
Article numberjkac255
JournalG3 (Bethesda, Md.)
Volume12
Issue number11
DOIs
Publication statusPublished - Nov 2022

Keywords

  • Arabidopsis thaliana
  • causal gene
  • eQTL
  • gene expression
  • machine learning

Fingerprint

Dive into the research topics of 'Prioritizing candidate eQTL causal genes in Arabidopsis using RANDOM FORESTS'. Together they form a unique fingerprint.

Cite this