A template framework for environmental timeseries data acquisition

Research output: Contribution to journalArticleAcademicpeer-review

2 Citations (Scopus)

Abstract

Environmental timeseries data variety is exploding in the Internet of Things era, making data reuse a very demanding task. Data acquisition and integration remains a laborious step of the environmental data lifecycle. Environmental data heterogeneity is a persistent issue, as data are becoming available through different protocols and stored under diverse, custom formats. In this work, we deal with syntactic heterogeneity in environmental timeseries data. Our approach is based on describing different dataset syntaxes using abstract representations, called templates. We designed and implemented EDAM (Environmental Data Acquisition Module), a template framework that facilitates timeseries data acquisition and integration. EDAM templates are written using programming language-agnostic semantics, and can be reused both for input and output, thus enabling data reuse via transformations across different formats. We demonstrate EDAM generality in seven case studies, which involve scraping online data, extracting observations from a relational database, or aggregating historical timeseries stored in local files. Case studies span different environmental sciences domains, including meteorology, agriculture, urban air quality and hydrology. We also demonstrate EDAM for data dissemination, as instructed by output templates. We identified several syntactic interoperability challenges though the case studies, that include managing with differences in formatting observables, temporal and spatial references, and metadata documentation, and addressed them with EDAM. EDAM implementation has been released under an open-source license.

Original languageEnglish
Pages (from-to)237-249
JournalEnvironmental Modelling and Software
Volume117
DOIs
Publication statusPublished - 1 Jul 2019

Fingerprint

data acquisition
Data acquisition
Data integration
Syntactics
Meteorology
environmental data
Hydrology
Metadata
Air quality
Interoperability
Computer programming languages
Agriculture
metadata
meteorology
Semantics
air quality
Network protocols
hydrology
agriculture

Keywords

  • Big data
  • Data acquisition
  • Environmental timeseries
  • Internet of things
  • Syntactic interoperability
  • Templates

Cite this

@article{5e73e7ae07bb421cad88b2681d536184,
title = "A template framework for environmental timeseries data acquisition",
abstract = "Environmental timeseries data variety is exploding in the Internet of Things era, making data reuse a very demanding task. Data acquisition and integration remains a laborious step of the environmental data lifecycle. Environmental data heterogeneity is a persistent issue, as data are becoming available through different protocols and stored under diverse, custom formats. In this work, we deal with syntactic heterogeneity in environmental timeseries data. Our approach is based on describing different dataset syntaxes using abstract representations, called templates. We designed and implemented EDAM (Environmental Data Acquisition Module), a template framework that facilitates timeseries data acquisition and integration. EDAM templates are written using programming language-agnostic semantics, and can be reused both for input and output, thus enabling data reuse via transformations across different formats. We demonstrate EDAM generality in seven case studies, which involve scraping online data, extracting observations from a relational database, or aggregating historical timeseries stored in local files. Case studies span different environmental sciences domains, including meteorology, agriculture, urban air quality and hydrology. We also demonstrate EDAM for data dissemination, as instructed by output templates. We identified several syntactic interoperability challenges though the case studies, that include managing with differences in formatting observables, temporal and spatial references, and metadata documentation, and addressed them with EDAM. EDAM implementation has been released under an open-source license.",
keywords = "Big data, Data acquisition, Environmental timeseries, Internet of things, Syntactic interoperability, Templates",
author = "Argyrios Samourkasidis and Evangelia Papoutsoglou and Athanasiadis, {Ioannis N.}",
year = "2019",
month = "7",
day = "1",
doi = "10.1016/j.envsoft.2018.10.009",
language = "English",
volume = "117",
pages = "237--249",
journal = "Environmental Modelling & Software",
issn = "1364-8152",
publisher = "Elsevier",

}

A template framework for environmental timeseries data acquisition. / Samourkasidis, Argyrios; Papoutsoglou, Evangelia; Athanasiadis, Ioannis N.

In: Environmental Modelling and Software, Vol. 117, 01.07.2019, p. 237-249.

Research output: Contribution to journalArticleAcademicpeer-review

TY - JOUR

T1 - A template framework for environmental timeseries data acquisition

AU - Samourkasidis, Argyrios

AU - Papoutsoglou, Evangelia

AU - Athanasiadis, Ioannis N.

PY - 2019/7/1

Y1 - 2019/7/1

N2 - Environmental timeseries data variety is exploding in the Internet of Things era, making data reuse a very demanding task. Data acquisition and integration remains a laborious step of the environmental data lifecycle. Environmental data heterogeneity is a persistent issue, as data are becoming available through different protocols and stored under diverse, custom formats. In this work, we deal with syntactic heterogeneity in environmental timeseries data. Our approach is based on describing different dataset syntaxes using abstract representations, called templates. We designed and implemented EDAM (Environmental Data Acquisition Module), a template framework that facilitates timeseries data acquisition and integration. EDAM templates are written using programming language-agnostic semantics, and can be reused both for input and output, thus enabling data reuse via transformations across different formats. We demonstrate EDAM generality in seven case studies, which involve scraping online data, extracting observations from a relational database, or aggregating historical timeseries stored in local files. Case studies span different environmental sciences domains, including meteorology, agriculture, urban air quality and hydrology. We also demonstrate EDAM for data dissemination, as instructed by output templates. We identified several syntactic interoperability challenges though the case studies, that include managing with differences in formatting observables, temporal and spatial references, and metadata documentation, and addressed them with EDAM. EDAM implementation has been released under an open-source license.

AB - Environmental timeseries data variety is exploding in the Internet of Things era, making data reuse a very demanding task. Data acquisition and integration remains a laborious step of the environmental data lifecycle. Environmental data heterogeneity is a persistent issue, as data are becoming available through different protocols and stored under diverse, custom formats. In this work, we deal with syntactic heterogeneity in environmental timeseries data. Our approach is based on describing different dataset syntaxes using abstract representations, called templates. We designed and implemented EDAM (Environmental Data Acquisition Module), a template framework that facilitates timeseries data acquisition and integration. EDAM templates are written using programming language-agnostic semantics, and can be reused both for input and output, thus enabling data reuse via transformations across different formats. We demonstrate EDAM generality in seven case studies, which involve scraping online data, extracting observations from a relational database, or aggregating historical timeseries stored in local files. Case studies span different environmental sciences domains, including meteorology, agriculture, urban air quality and hydrology. We also demonstrate EDAM for data dissemination, as instructed by output templates. We identified several syntactic interoperability challenges though the case studies, that include managing with differences in formatting observables, temporal and spatial references, and metadata documentation, and addressed them with EDAM. EDAM implementation has been released under an open-source license.

KW - Big data

KW - Data acquisition

KW - Environmental timeseries

KW - Internet of things

KW - Syntactic interoperability

KW - Templates

U2 - 10.1016/j.envsoft.2018.10.009

DO - 10.1016/j.envsoft.2018.10.009

M3 - Article

VL - 117

SP - 237

EP - 249

JO - Environmental Modelling & Software

JF - Environmental Modelling & Software

SN - 1364-8152

ER -