Evaluating and deploying genome-scale metabolic models for microbial cell factories

Nhung Pham

Research output: Thesisinternal PhD, WU


Advances in genome sequencing and high-throughput technologies have boosted the development of Synthetic biology and Systems biology. Synthetic biology aims to create and reprogram natural systems.  Advances in Synthetic biology has facilitated the adoption of the design build test learn cycles into metabolic engineering. The DBTL cycles are a recursive loop that aims to optimize the development of microbial factories in a more systematic and efficient manner. Systems biology aims to study living organism at system level using holistic approaches. Among different modelling tools in Systems biology, genome-scale constraint-based metabolic model is the most successful approach to study the whole metabolic network. GEM is comprehensive knowledgebase that contain metabolic reactions that known to occur in a target organism. GEMs have been used in many applications to guide metabolic engineering and contextualizing ‘omics’ data. The objective of this thesis is to deploy GEMs for microbial cell factories and evaluate their main technical limitations.

Chapter 1 describes two paradigms: reductionism and holism in life sciences, Systems biology, genome-scale constraint-based metabolic models, Synthetic biology and the design build test learn cycle. Chapter 1 provides the background for all other chapters.

In Chapter 2 I constructed a GEM for Cutaneotrichosporon oleaginosus to model its lipid production. C. oleaginosus is a fast-growing oleaginous yeast that can grow in a wide range of low-cost carbon sources. I constructed a GEM to increase our understanding of this yeast and provide a knowledge base for further industrial use. A new modelling approach was introduced to account for changes in the biomass composition of this organism in conditions with high carbon to nitrogen (C/N) ratio in the media. This modelling approach is shown to better predict conditions with high lipid accumulation using glucose, fructose, sucrose, xylose, and glycerol as sole carbon source. The model suggests ATP-citrate lyase as a possible target to further improve lipid.

Producing chemicals from living cells has been considering a sustainable approach to life. The biosynthesis of many natural compounds is still limited due to the lack of efficient synthesis routes. As a showcase of how GEMs can assist in designing pathways for chemical production in microbes, in Chapter 3 I employed GEMs to design and evaluate pathways for cis,cis-muconic acids, anisole, aniline, 3-methylmalate, and geranic acid production in Pseudomonas putida in the context of the Design-Build-Test-Learn cycles. I established a general system to rank these pathways based on thermodynamic feasibility, enzyme sequence availability, and maximum theoretical yield. Among our target compounds, cis,cis-muconic acid is a well-known chemical with many published biosynthesis pathways. For this well-studied target, we predicted 8 pathways, of which 2 pathways had not been reported earlier. Novel pathways were predicted to produce anisole, aniline, 3-methylmalate, and geranic acid.

While constructing and using GEMs in Chapters 2 and 3, two main problems have recurred. The first problem is the use of inconsistent namespaces among GEMs. A critical step in constructing GEMs is to manually curate them by integrating information from independent (organism-specific) sources to provide a comprehensive representation of what is presently known about the metabolism of the modelled organism. Combining this precious information from individual GEMs to make a consensus model of the organism is essential. Using models from different species as a foundation to construct a new model can help to avoid repeating the same time-consuming manual curation step. In addition, GEMs need to be updated continuously since new knowledge is coming in short order. However, such simple tasks cannot be done easily due to a simple reason: inconsistent namespaces. GEMs constructed for different organisms by different researchers often use different naming conventions depending on which databases were selected for model construction. While mapping between namespaces seems like the only fair solution, it involves a high risk of mismatch and may invalidate the model. I evaluated the (in)consistency of names and non-systematic identifiers used in 11 biochemical databases of biochemical reactions and the problems that arise when mapping between different namespaces and databases in Chapter 4. I found that such inconsistencies can be as high as 83.1%, thus emphasizing the need for strategies to deal with these issues. Currently, manual verification of the mappings appears to be the only solution to remove inconsistencies when combining models.

The second problem that has arisen relates to the efficiency of gap-filling tools. The lack of accurate functional annotations often renders GEMs incomplete, giving rise to missing reactions, the so-called ‘gaps’ in the network. Gap-filling becomes important during model construction not only to make a functional model but also to generate new knowledge on protein function. To assist gap-filling, many algorithms have been published. To be able to use GEMs effectively, these methods should allow the model to be as accurate as possible, preferably also in a user-friendly manner so that they become available to many researchers. However, gap-filling algorithms vastly differ in their objectives, implementation platforms, and input data requirements. These differences imply a variety in their usability and accuracy. In Chapter 5 I conducted an extensive evaluation of these algorithms from a user’s perspective. We found that most of the tools are not used due to the lack of a workable implementation. From those for which an implementation is readily available, we selected SMILEY, FASTGAPFILL and Meneco to further investigate their performances. SMILEY is the best among the three algorithms for small-scale degradation.

Finally, in Chapter 6 I discussed the three significant themes stood out across Chapters 2,3,4,5 1) the lack of standards in namespaces, tool development, and guidelines for model evaluation, 2) the need to improve models and computational tools, for instance to account for uncertainty in the biomass synthesis reaction or to improve gap-filling algorithms, and 3) the potential contribution of GEMs to the DBTL cycle. 

In conclusion, the work presented in this thesis illustrates how the lack of standards in GEMs can hamper their potential. GEMs have great potential in the DBTL cycles. Standardization and improvement in GEM formulation are needed to maximize the use of these models.

Original languageEnglish
QualificationDoctor of Philosophy
Awarding Institution
  • Wageningen University
  • Martins dos Santos, Vitor, Promotor
  • Schaap, Peter, Co-promotor
  • Suarez-Diez, Maria , Co-promotor
Award date25 Jan 2021
Place of PublicationWageningen
Print ISBNs9789463956550
Publication statusPublished - 2021


Dive into the research topics of 'Evaluating and deploying genome-scale metabolic models for microbial cell factories'. Together they form a unique fingerprint.

Cite this