Designing Eukaryotic Gene Expression Regulation Using Machine Learning

Ronald P.H. de Jongh, Aalt D.J. van Dijk, Mattijs K. Julsing, Peter J. Schaap, Dick de Ridder*

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

Controlling the expression of genes is one of the key challenges of synthetic biology. Until recently fine-tuned control has been out of reach, particularly in eukaryotes owing to their complexity of gene regulation. With advances in machine learning (ML) and in particular with increasing dataset sizes, models predicting gene expression levels from regulatory sequences can now be successfully constructed. Such models form the cornerstone of algorithms that allow users to design regulatory regions to achieve a specific gene expression level. In this review we discuss strategies for data collection, data encoding, ML practices, design algorithm choices, and finally model interpretation. Ultimately, these developments will provide synthetic biologists with highly specific genetic building blocks to rationally engineer complex pathways and circuits.

Original languageEnglish
Pages (from-to)191-201
JournalTrends in Biotechnology
Volume38
Issue number2
Early online date17 Aug 2019
DOIs
Publication statusPublished - Feb 2020

Fingerprint

Gene expression regulation
Gene Expression Regulation
Gene expression
Learning systems
Gene Expression
Synthetic Biology
Nucleic Acid Regulatory Sequences
Eukaryota
Genes
Engineers
Networks (circuits)
Machine Learning

Keywords

  • DNA design
  • eukaryotic gene expression
  • gene regulation
  • machine learning
  • synthetic biology

Cite this

@article{63c0d1db27ec40489e8742ccd5390091,
title = "Designing Eukaryotic Gene Expression Regulation Using Machine Learning",
abstract = "Controlling the expression of genes is one of the key challenges of synthetic biology. Until recently fine-tuned control has been out of reach, particularly in eukaryotes owing to their complexity of gene regulation. With advances in machine learning (ML) and in particular with increasing dataset sizes, models predicting gene expression levels from regulatory sequences can now be successfully constructed. Such models form the cornerstone of algorithms that allow users to design regulatory regions to achieve a specific gene expression level. In this review we discuss strategies for data collection, data encoding, ML practices, design algorithm choices, and finally model interpretation. Ultimately, these developments will provide synthetic biologists with highly specific genetic building blocks to rationally engineer complex pathways and circuits.",
keywords = "DNA design, eukaryotic gene expression, gene regulation, machine learning, synthetic biology",
author = "{de Jongh}, {Ronald P.H.} and {van Dijk}, {Aalt D.J.} and Julsing, {Mattijs K.} and Schaap, {Peter J.} and {de Ridder}, Dick",
year = "2020",
month = "2",
doi = "10.1016/j.tibtech.2019.07.007",
language = "English",
volume = "38",
pages = "191--201",
journal = "Trends in Biotechnology",
issn = "0167-7799",
publisher = "Elsevier",
number = "2",

}

Designing Eukaryotic Gene Expression Regulation Using Machine Learning. / de Jongh, Ronald P.H.; van Dijk, Aalt D.J.; Julsing, Mattijs K.; Schaap, Peter J.; de Ridder, Dick.

In: Trends in Biotechnology, Vol. 38, No. 2, 02.2020, p. 191-201.

Research output: Contribution to journalArticleAcademicpeer-review

TY - JOUR

T1 - Designing Eukaryotic Gene Expression Regulation Using Machine Learning

AU - de Jongh, Ronald P.H.

AU - van Dijk, Aalt D.J.

AU - Julsing, Mattijs K.

AU - Schaap, Peter J.

AU - de Ridder, Dick

PY - 2020/2

Y1 - 2020/2

N2 - Controlling the expression of genes is one of the key challenges of synthetic biology. Until recently fine-tuned control has been out of reach, particularly in eukaryotes owing to their complexity of gene regulation. With advances in machine learning (ML) and in particular with increasing dataset sizes, models predicting gene expression levels from regulatory sequences can now be successfully constructed. Such models form the cornerstone of algorithms that allow users to design regulatory regions to achieve a specific gene expression level. In this review we discuss strategies for data collection, data encoding, ML practices, design algorithm choices, and finally model interpretation. Ultimately, these developments will provide synthetic biologists with highly specific genetic building blocks to rationally engineer complex pathways and circuits.

AB - Controlling the expression of genes is one of the key challenges of synthetic biology. Until recently fine-tuned control has been out of reach, particularly in eukaryotes owing to their complexity of gene regulation. With advances in machine learning (ML) and in particular with increasing dataset sizes, models predicting gene expression levels from regulatory sequences can now be successfully constructed. Such models form the cornerstone of algorithms that allow users to design regulatory regions to achieve a specific gene expression level. In this review we discuss strategies for data collection, data encoding, ML practices, design algorithm choices, and finally model interpretation. Ultimately, these developments will provide synthetic biologists with highly specific genetic building blocks to rationally engineer complex pathways and circuits.

KW - DNA design

KW - eukaryotic gene expression

KW - gene regulation

KW - machine learning

KW - synthetic biology

U2 - 10.1016/j.tibtech.2019.07.007

DO - 10.1016/j.tibtech.2019.07.007

M3 - Article

VL - 38

SP - 191

EP - 201

JO - Trends in Biotechnology

JF - Trends in Biotechnology

SN - 0167-7799

IS - 2

ER -