The KnownLeaf literature curation system captures knowledge about Arabidopsis leaf growth and development and facilitates integrated data mining

D. Szakonyi, S. van Landeghem, K. Baerenfaller, L. Baeyens, J. Blomme, R. Casanova-Saéz, S. De Bodt, D. Esteve-Bruna, F. Fiorani, N. Gonzalez, J. Grønlund, R.G.H. Immink, S. Jover-Gil, A. Kuwabara, T. Muñoz-Nortes, A.D.J. van Dijk, D. Wilson-Sánchez, V. Buchanan-Wollaston, G.C. Angenent, Y. Van de Peer & 5 others D. Inzé, J.L. Micol, W. Gruissem, S. Walsh, P. Hilson

Research output: Contribution to journalArticleAcademicpeer-review

4 Citations (Scopus)

Abstract

The information that connects genotypes and phenotypes is essentially embedded in research articles written in natural language. To facilitate access to this knowledge, we constructed a framework for the curation of the scientific literature studying the molecular mechanisms that control leaf growth and development in Arabidopsis thaliana (Arabidopsis). Standard structured statements, called relations, were designed to capture diverse data types, including phenotypes and gene expression linked to genotype description, growth conditions, genetic and molecular interactions, and details about molecular entities. Relations were then annotated from the literature, defining the relevant terms according to standard biomedical ontologies. This curation process was supported by a dedicated graphical user interface, called Leaf Knowtator. A total of 283 primary research articles were curated by a community of annotators, yielding 9947 relations monitored for consistency and over 12,500 references to Arabidopsis genes. This information was converted into a relational database (KnownLeaf) and merged with other public Arabidopsis resources relative to transcriptional networks, protein–protein interaction, gene co-expression, and additional molecular annotations. Within KnownLeaf, leaf phenotype data can be searched together with molecular data originating either from this curation initiative or from external public resources. Finally, we built a network (LeafNet) with a portion of the KnownLeaf database content to graphically represent the leaf phenotype relations in a molecular context, offering an intuitive starting point for knowledge mining. Literature curation efforts such as ours provide high quality structured information accessible to computational analysis, and thereby to a wide range of applications. DATA: The presented work was performed in the framework of the AGRON-OMICS project (Arabidopsis GRO wth Network integrating OMICS technologies) supported by European Commission 6th Framework Programme project (Grant number LSHG-CT-2006-037704). This is a data integration and data sharing portal collecting all the all the major results from the consortium. All data presented in our paper is available here. https://agronomics.ethz.ch/.
LanguageEnglish
Pages1-11
JournalCurrent Plant Biology
Volume2
DOIs
Publication statusPublished - 2015

Fingerprint

Data Mining
Growth and Development
Arabidopsis
Data mining
growth and development
Genes
phenotype
Molecular interactions
Data integration
Phenotype
Graphical user interfaces
Gene expression
Ontology
leaves
Data acquisition
Proteins
Biological Ontologies
user interface
Genotype
genotype

Cite this

Szakonyi, D. ; van Landeghem, S. ; Baerenfaller, K. ; Baeyens, L. ; Blomme, J. ; Casanova-Saéz, R. ; De Bodt, S. ; Esteve-Bruna, D. ; Fiorani, F. ; Gonzalez, N. ; Grønlund, J. ; Immink, R.G.H. ; Jover-Gil, S. ; Kuwabara, A. ; Muñoz-Nortes, T. ; van Dijk, A.D.J. ; Wilson-Sánchez, D. ; Buchanan-Wollaston, V. ; Angenent, G.C. ; Van de Peer, Y. ; Inzé, D. ; Micol, J.L. ; Gruissem, W. ; Walsh, S. ; Hilson, P. / The KnownLeaf literature curation system captures knowledge about Arabidopsis leaf growth and development and facilitates integrated data mining. In: Current Plant Biology. 2015 ; Vol. 2. pp. 1-11.
@article{b4246dc2a0c44b1e908c4585bd28cf43,
title = "The KnownLeaf literature curation system captures knowledge about Arabidopsis leaf growth and development and facilitates integrated data mining",
abstract = "The information that connects genotypes and phenotypes is essentially embedded in research articles written in natural language. To facilitate access to this knowledge, we constructed a framework for the curation of the scientific literature studying the molecular mechanisms that control leaf growth and development in Arabidopsis thaliana (Arabidopsis). Standard structured statements, called relations, were designed to capture diverse data types, including phenotypes and gene expression linked to genotype description, growth conditions, genetic and molecular interactions, and details about molecular entities. Relations were then annotated from the literature, defining the relevant terms according to standard biomedical ontologies. This curation process was supported by a dedicated graphical user interface, called Leaf Knowtator. A total of 283 primary research articles were curated by a community of annotators, yielding 9947 relations monitored for consistency and over 12,500 references to Arabidopsis genes. This information was converted into a relational database (KnownLeaf) and merged with other public Arabidopsis resources relative to transcriptional networks, protein–protein interaction, gene co-expression, and additional molecular annotations. Within KnownLeaf, leaf phenotype data can be searched together with molecular data originating either from this curation initiative or from external public resources. Finally, we built a network (LeafNet) with a portion of the KnownLeaf database content to graphically represent the leaf phenotype relations in a molecular context, offering an intuitive starting point for knowledge mining. Literature curation efforts such as ours provide high quality structured information accessible to computational analysis, and thereby to a wide range of applications. DATA: The presented work was performed in the framework of the AGRON-OMICS project (Arabidopsis GRO wth Network integrating OMICS technologies) supported by European Commission 6th Framework Programme project (Grant number LSHG-CT-2006-037704). This is a data integration and data sharing portal collecting all the all the major results from the consortium. All data presented in our paper is available here. https://agronomics.ethz.ch/.",
author = "D. Szakonyi and {van Landeghem}, S. and K. Baerenfaller and L. Baeyens and J. Blomme and R. Casanova-Sa{\'e}z and {De Bodt}, S. and D. Esteve-Bruna and F. Fiorani and N. Gonzalez and J. Gr{\o}nlund and R.G.H. Immink and S. Jover-Gil and A. Kuwabara and T. Mu{\~n}oz-Nortes and {van Dijk}, A.D.J. and D. Wilson-S{\'a}nchez and V. Buchanan-Wollaston and G.C. Angenent and {Van de Peer}, Y. and D. Inz{\'e} and J.L. Micol and W. Gruissem and S. Walsh and P. Hilson",
note = "http://www.sciencedirect.com/science/article/pii/S2214662815000031",
year = "2015",
doi = "10.1016/j.cpb.2014.12.002",
language = "English",
volume = "2",
pages = "1--11",
journal = "Current Plant Biology",
issn = "2214-6628",
publisher = "Elsevier BV",

}

Szakonyi, D, van Landeghem, S, Baerenfaller, K, Baeyens, L, Blomme, J, Casanova-Saéz, R, De Bodt, S, Esteve-Bruna, D, Fiorani, F, Gonzalez, N, Grønlund, J, Immink, RGH, Jover-Gil, S, Kuwabara, A, Muñoz-Nortes, T, van Dijk, ADJ, Wilson-Sánchez, D, Buchanan-Wollaston, V, Angenent, GC, Van de Peer, Y, Inzé, D, Micol, JL, Gruissem, W, Walsh, S & Hilson, P 2015, 'The KnownLeaf literature curation system captures knowledge about Arabidopsis leaf growth and development and facilitates integrated data mining', Current Plant Biology, vol. 2, pp. 1-11. https://doi.org/10.1016/j.cpb.2014.12.002

The KnownLeaf literature curation system captures knowledge about Arabidopsis leaf growth and development and facilitates integrated data mining. / Szakonyi, D.; van Landeghem, S.; Baerenfaller, K.; Baeyens, L.; Blomme, J.; Casanova-Saéz, R.; De Bodt, S.; Esteve-Bruna, D.; Fiorani, F.; Gonzalez, N.; Grønlund, J.; Immink, R.G.H.; Jover-Gil, S.; Kuwabara, A.; Muñoz-Nortes, T.; van Dijk, A.D.J.; Wilson-Sánchez, D.; Buchanan-Wollaston, V.; Angenent, G.C.; Van de Peer, Y.; Inzé, D.; Micol, J.L.; Gruissem, W.; Walsh, S.; Hilson, P.

In: Current Plant Biology, Vol. 2, 2015, p. 1-11.

Research output: Contribution to journalArticleAcademicpeer-review

TY - JOUR

T1 - The KnownLeaf literature curation system captures knowledge about Arabidopsis leaf growth and development and facilitates integrated data mining

AU - Szakonyi, D.

AU - van Landeghem, S.

AU - Baerenfaller, K.

AU - Baeyens, L.

AU - Blomme, J.

AU - Casanova-Saéz, R.

AU - De Bodt, S.

AU - Esteve-Bruna, D.

AU - Fiorani, F.

AU - Gonzalez, N.

AU - Grønlund, J.

AU - Immink, R.G.H.

AU - Jover-Gil, S.

AU - Kuwabara, A.

AU - Muñoz-Nortes, T.

AU - van Dijk, A.D.J.

AU - Wilson-Sánchez, D.

AU - Buchanan-Wollaston, V.

AU - Angenent, G.C.

AU - Van de Peer, Y.

AU - Inzé, D.

AU - Micol, J.L.

AU - Gruissem, W.

AU - Walsh, S.

AU - Hilson, P.

N1 - http://www.sciencedirect.com/science/article/pii/S2214662815000031

PY - 2015

Y1 - 2015

N2 - The information that connects genotypes and phenotypes is essentially embedded in research articles written in natural language. To facilitate access to this knowledge, we constructed a framework for the curation of the scientific literature studying the molecular mechanisms that control leaf growth and development in Arabidopsis thaliana (Arabidopsis). Standard structured statements, called relations, were designed to capture diverse data types, including phenotypes and gene expression linked to genotype description, growth conditions, genetic and molecular interactions, and details about molecular entities. Relations were then annotated from the literature, defining the relevant terms according to standard biomedical ontologies. This curation process was supported by a dedicated graphical user interface, called Leaf Knowtator. A total of 283 primary research articles were curated by a community of annotators, yielding 9947 relations monitored for consistency and over 12,500 references to Arabidopsis genes. This information was converted into a relational database (KnownLeaf) and merged with other public Arabidopsis resources relative to transcriptional networks, protein–protein interaction, gene co-expression, and additional molecular annotations. Within KnownLeaf, leaf phenotype data can be searched together with molecular data originating either from this curation initiative or from external public resources. Finally, we built a network (LeafNet) with a portion of the KnownLeaf database content to graphically represent the leaf phenotype relations in a molecular context, offering an intuitive starting point for knowledge mining. Literature curation efforts such as ours provide high quality structured information accessible to computational analysis, and thereby to a wide range of applications. DATA: The presented work was performed in the framework of the AGRON-OMICS project (Arabidopsis GRO wth Network integrating OMICS technologies) supported by European Commission 6th Framework Programme project (Grant number LSHG-CT-2006-037704). This is a data integration and data sharing portal collecting all the all the major results from the consortium. All data presented in our paper is available here. https://agronomics.ethz.ch/.

AB - The information that connects genotypes and phenotypes is essentially embedded in research articles written in natural language. To facilitate access to this knowledge, we constructed a framework for the curation of the scientific literature studying the molecular mechanisms that control leaf growth and development in Arabidopsis thaliana (Arabidopsis). Standard structured statements, called relations, were designed to capture diverse data types, including phenotypes and gene expression linked to genotype description, growth conditions, genetic and molecular interactions, and details about molecular entities. Relations were then annotated from the literature, defining the relevant terms according to standard biomedical ontologies. This curation process was supported by a dedicated graphical user interface, called Leaf Knowtator. A total of 283 primary research articles were curated by a community of annotators, yielding 9947 relations monitored for consistency and over 12,500 references to Arabidopsis genes. This information was converted into a relational database (KnownLeaf) and merged with other public Arabidopsis resources relative to transcriptional networks, protein–protein interaction, gene co-expression, and additional molecular annotations. Within KnownLeaf, leaf phenotype data can be searched together with molecular data originating either from this curation initiative or from external public resources. Finally, we built a network (LeafNet) with a portion of the KnownLeaf database content to graphically represent the leaf phenotype relations in a molecular context, offering an intuitive starting point for knowledge mining. Literature curation efforts such as ours provide high quality structured information accessible to computational analysis, and thereby to a wide range of applications. DATA: The presented work was performed in the framework of the AGRON-OMICS project (Arabidopsis GRO wth Network integrating OMICS technologies) supported by European Commission 6th Framework Programme project (Grant number LSHG-CT-2006-037704). This is a data integration and data sharing portal collecting all the all the major results from the consortium. All data presented in our paper is available here. https://agronomics.ethz.ch/.

U2 - 10.1016/j.cpb.2014.12.002

DO - 10.1016/j.cpb.2014.12.002

M3 - Article

VL - 2

SP - 1

EP - 11

JO - Current Plant Biology

T2 - Current Plant Biology

JF - Current Plant Biology

SN - 2214-6628

ER -