Identifying and extracting quantitative data in annotated text

    Research output: Chapter in Book/Report/Conference proceedingConference paperAcademicpeer-review

    1 Citation (Scopus)

    Abstract

    In science it is difficult to reuse quantitative scientific data. For example, it is not possible to search for quantitative data in papers in a directed way, such as using the query "Select the storage modulus of dairy product A after the temperature has decreased from 90 to 4±C". This is caused by the fact that data is made available in (relatively) free formats as in scientific papers, spreadsheets, or databases, all with limited annotation and description of the way they were obtained.Meaning is lost, for example about what the numbers relate to (quantities and units are often poorly indicated). Many researchers, especially in the physical and computer sciences use LATEX in their creation of scientific papers. In this paper we present a set of LATEX-style files, which use the terminology defined in wurvoc.org, that can be used to annotate scientific papers. These style files define a set of commands, each representing a specific quantity or unit. If the LATEX is typeset into a PDF file, quantities and units in the PDF will be annotated with the appropriate references (URIs) to the corresponding concepts in theOMontology. This will not only disambiguate the use of these quantities and units, but will also enable us to extract triples from the PDF, facilitating the use of SPARQL queries to answer advanced quantitative question.
    Original languageEnglish
    Title of host publicationProceedings of the Workshop on Semantic Web and Information Extraction (SWAIE 2012), 09 October 2012, Galway, Ireland
    EditorsD. Brian Davis Maynard, M. van Erp, B. Davis
    Pages43-54
    Publication statusPublished - 2012
    EventSemantic Web and Information Extraction 2012 (SWAIE2012), in conjunction with the 18th International Conference on Knowledge Engineering and Knowledge Management, Galway, Ireland -
    Duration: 9 Oct 20129 Oct 2012

    Conference

    ConferenceSemantic Web and Information Extraction 2012 (SWAIE2012), in conjunction with the 18th International Conference on Knowledge Engineering and Knowledge Management, Galway, Ireland
    Period9/10/129/10/12

    Fingerprint

    spreadsheet
    terminology
    temperature
    science
    dairy product

    Cite this

    Willems, D. J. M., Rijgersberg, H., & Top, J. (2012). Identifying and extracting quantitative data in annotated text. In D. Brian Davis Maynard, M. van Erp, & B. Davis (Eds.), Proceedings of the Workshop on Semantic Web and Information Extraction (SWAIE 2012), 09 October 2012, Galway, Ireland (pp. 43-54)
    Willems, D.J.M. ; Rijgersberg, H. ; Top, J. / Identifying and extracting quantitative data in annotated text. Proceedings of the Workshop on Semantic Web and Information Extraction (SWAIE 2012), 09 October 2012, Galway, Ireland. editor / D. Brian Davis Maynard ; M. van Erp ; B. Davis. 2012. pp. 43-54
    @inproceedings{d7dd32390e364ae1ae5e49ed5e53e2f5,
    title = "Identifying and extracting quantitative data in annotated text",
    abstract = "In science it is difficult to reuse quantitative scientific data. For example, it is not possible to search for quantitative data in papers in a directed way, such as using the query {"}Select the storage modulus of dairy product A after the temperature has decreased from 90 to 4±C{"}. This is caused by the fact that data is made available in (relatively) free formats as in scientific papers, spreadsheets, or databases, all with limited annotation and description of the way they were obtained.Meaning is lost, for example about what the numbers relate to (quantities and units are often poorly indicated). Many researchers, especially in the physical and computer sciences use LATEX in their creation of scientific papers. In this paper we present a set of LATEX-style files, which use the terminology defined in wurvoc.org, that can be used to annotate scientific papers. These style files define a set of commands, each representing a specific quantity or unit. If the LATEX is typeset into a PDF file, quantities and units in the PDF will be annotated with the appropriate references (URIs) to the corresponding concepts in theOMontology. This will not only disambiguate the use of these quantities and units, but will also enable us to extract triples from the PDF, facilitating the use of SPARQL queries to answer advanced quantitative question.",
    author = "D.J.M. Willems and H. Rijgersberg and J. Top",
    year = "2012",
    language = "English",
    pages = "43--54",
    editor = "{Brian Davis Maynard}, D. and {van Erp}, M. and B. Davis",
    booktitle = "Proceedings of the Workshop on Semantic Web and Information Extraction (SWAIE 2012), 09 October 2012, Galway, Ireland",

    }

    Willems, DJM, Rijgersberg, H & Top, J 2012, Identifying and extracting quantitative data in annotated text. in D Brian Davis Maynard, M van Erp & B Davis (eds), Proceedings of the Workshop on Semantic Web and Information Extraction (SWAIE 2012), 09 October 2012, Galway, Ireland. pp. 43-54, Semantic Web and Information Extraction 2012 (SWAIE2012), in conjunction with the 18th International Conference on Knowledge Engineering and Knowledge Management, Galway, Ireland, 9/10/12.

    Identifying and extracting quantitative data in annotated text. / Willems, D.J.M.; Rijgersberg, H.; Top, J.

    Proceedings of the Workshop on Semantic Web and Information Extraction (SWAIE 2012), 09 October 2012, Galway, Ireland. ed. / D. Brian Davis Maynard; M. van Erp; B. Davis. 2012. p. 43-54.

    Research output: Chapter in Book/Report/Conference proceedingConference paperAcademicpeer-review

    TY - GEN

    T1 - Identifying and extracting quantitative data in annotated text

    AU - Willems, D.J.M.

    AU - Rijgersberg, H.

    AU - Top, J.

    PY - 2012

    Y1 - 2012

    N2 - In science it is difficult to reuse quantitative scientific data. For example, it is not possible to search for quantitative data in papers in a directed way, such as using the query "Select the storage modulus of dairy product A after the temperature has decreased from 90 to 4±C". This is caused by the fact that data is made available in (relatively) free formats as in scientific papers, spreadsheets, or databases, all with limited annotation and description of the way they were obtained.Meaning is lost, for example about what the numbers relate to (quantities and units are often poorly indicated). Many researchers, especially in the physical and computer sciences use LATEX in their creation of scientific papers. In this paper we present a set of LATEX-style files, which use the terminology defined in wurvoc.org, that can be used to annotate scientific papers. These style files define a set of commands, each representing a specific quantity or unit. If the LATEX is typeset into a PDF file, quantities and units in the PDF will be annotated with the appropriate references (URIs) to the corresponding concepts in theOMontology. This will not only disambiguate the use of these quantities and units, but will also enable us to extract triples from the PDF, facilitating the use of SPARQL queries to answer advanced quantitative question.

    AB - In science it is difficult to reuse quantitative scientific data. For example, it is not possible to search for quantitative data in papers in a directed way, such as using the query "Select the storage modulus of dairy product A after the temperature has decreased from 90 to 4±C". This is caused by the fact that data is made available in (relatively) free formats as in scientific papers, spreadsheets, or databases, all with limited annotation and description of the way they were obtained.Meaning is lost, for example about what the numbers relate to (quantities and units are often poorly indicated). Many researchers, especially in the physical and computer sciences use LATEX in their creation of scientific papers. In this paper we present a set of LATEX-style files, which use the terminology defined in wurvoc.org, that can be used to annotate scientific papers. These style files define a set of commands, each representing a specific quantity or unit. If the LATEX is typeset into a PDF file, quantities and units in the PDF will be annotated with the appropriate references (URIs) to the corresponding concepts in theOMontology. This will not only disambiguate the use of these quantities and units, but will also enable us to extract triples from the PDF, facilitating the use of SPARQL queries to answer advanced quantitative question.

    M3 - Conference paper

    SP - 43

    EP - 54

    BT - Proceedings of the Workshop on Semantic Web and Information Extraction (SWAIE 2012), 09 October 2012, Galway, Ireland

    A2 - Brian Davis Maynard, D.

    A2 - van Erp, M.

    A2 - Davis, B.

    ER -

    Willems DJM, Rijgersberg H, Top J. Identifying and extracting quantitative data in annotated text. In Brian Davis Maynard D, van Erp M, Davis B, editors, Proceedings of the Workshop on Semantic Web and Information Extraction (SWAIE 2012), 09 October 2012, Galway, Ireland. 2012. p. 43-54