The Empusa code generator and its application to GBOL, an extendable ontology for genome annotation

Jesse C.J. van Dam, Jasper J. Koehorst, Jon Olav Vik, Vitor A.P. Martins dos Santos, Peter J. Schaap, Maria Suarez-Diez*

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

The RDF data model facilitates integration of diverse data available in structured and semi-structured formats. To obtain a coherent RDF graph the chosen ontology must be consistently applied. However, addition of new diverse data causes the ontology to evolve, which could lead to accumulation of unintended erroneous composites. Thus, there is a need for a gate keeping system that compares the intended content described in the ontology with the actual content of the resource. The Empusa code generator facilitates creation of composite RDF resources from disparate sources. Empusa can convert a schema into an associated application programming interface (API), that can be used to perform data consistency checks and generates Markdown documentation to make persistent URLs resolvable. Using Empusa consistency is ensured within and between the ontology and the content of the resource. As an illustration of the potential of Empusa, we present the Genome Biology Ontology Language (GBOL). GBOL uses and extends current ontologies to provide a formal representation of genomic entities, along with their properties, relations and provenance.

Original languageEnglish
Article number254
Number of pages1
JournalScientific Data
Volume6
Issue number1
DOIs
Publication statusPublished - 4 Nov 2019

Fingerprint

ontology
Biology
Ontology
Annotation
biology
Genome
Genes
Generator
language
Resources
Composite
resources
Data Consistency
Provenance
Composite materials
Language
Application programming interfaces (API)
Data Model
Schema
Convert

Cite this

@article{e2916c3744504279bf41e5501dc1164c,
title = "The Empusa code generator and its application to GBOL, an extendable ontology for genome annotation",
abstract = "The RDF data model facilitates integration of diverse data available in structured and semi-structured formats. To obtain a coherent RDF graph the chosen ontology must be consistently applied. However, addition of new diverse data causes the ontology to evolve, which could lead to accumulation of unintended erroneous composites. Thus, there is a need for a gate keeping system that compares the intended content described in the ontology with the actual content of the resource. The Empusa code generator facilitates creation of composite RDF resources from disparate sources. Empusa can convert a schema into an associated application programming interface (API), that can be used to perform data consistency checks and generates Markdown documentation to make persistent URLs resolvable. Using Empusa consistency is ensured within and between the ontology and the content of the resource. As an illustration of the potential of Empusa, we present the Genome Biology Ontology Language (GBOL). GBOL uses and extends current ontologies to provide a formal representation of genomic entities, along with their properties, relations and provenance.",
author = "{van Dam}, {Jesse C.J.} and Koehorst, {Jasper J.} and Vik, {Jon Olav} and {Martins dos Santos}, {Vitor A.P.} and Schaap, {Peter J.} and Maria Suarez-Diez",
year = "2019",
month = "11",
day = "4",
doi = "10.1038/s41597-019-0263-7",
language = "English",
volume = "6",
journal = "Scientific Data",
issn = "2052-4463",
publisher = "Macmillan Publishers",
number = "1",

}

The Empusa code generator and its application to GBOL, an extendable ontology for genome annotation. / van Dam, Jesse C.J.; Koehorst, Jasper J.; Vik, Jon Olav; Martins dos Santos, Vitor A.P.; Schaap, Peter J.; Suarez-Diez, Maria.

In: Scientific Data, Vol. 6, No. 1, 254, 04.11.2019.

Research output: Contribution to journalArticleAcademicpeer-review

TY - JOUR

T1 - The Empusa code generator and its application to GBOL, an extendable ontology for genome annotation

AU - van Dam, Jesse C.J.

AU - Koehorst, Jasper J.

AU - Vik, Jon Olav

AU - Martins dos Santos, Vitor A.P.

AU - Schaap, Peter J.

AU - Suarez-Diez, Maria

PY - 2019/11/4

Y1 - 2019/11/4

N2 - The RDF data model facilitates integration of diverse data available in structured and semi-structured formats. To obtain a coherent RDF graph the chosen ontology must be consistently applied. However, addition of new diverse data causes the ontology to evolve, which could lead to accumulation of unintended erroneous composites. Thus, there is a need for a gate keeping system that compares the intended content described in the ontology with the actual content of the resource. The Empusa code generator facilitates creation of composite RDF resources from disparate sources. Empusa can convert a schema into an associated application programming interface (API), that can be used to perform data consistency checks and generates Markdown documentation to make persistent URLs resolvable. Using Empusa consistency is ensured within and between the ontology and the content of the resource. As an illustration of the potential of Empusa, we present the Genome Biology Ontology Language (GBOL). GBOL uses and extends current ontologies to provide a formal representation of genomic entities, along with their properties, relations and provenance.

AB - The RDF data model facilitates integration of diverse data available in structured and semi-structured formats. To obtain a coherent RDF graph the chosen ontology must be consistently applied. However, addition of new diverse data causes the ontology to evolve, which could lead to accumulation of unintended erroneous composites. Thus, there is a need for a gate keeping system that compares the intended content described in the ontology with the actual content of the resource. The Empusa code generator facilitates creation of composite RDF resources from disparate sources. Empusa can convert a schema into an associated application programming interface (API), that can be used to perform data consistency checks and generates Markdown documentation to make persistent URLs resolvable. Using Empusa consistency is ensured within and between the ontology and the content of the resource. As an illustration of the potential of Empusa, we present the Genome Biology Ontology Language (GBOL). GBOL uses and extends current ontologies to provide a formal representation of genomic entities, along with their properties, relations and provenance.

U2 - 10.1038/s41597-019-0263-7

DO - 10.1038/s41597-019-0263-7

M3 - Article

VL - 6

JO - Scientific Data

JF - Scientific Data

SN - 2052-4463

IS - 1

M1 - 254

ER -