Towards a harmonization of distributed trait datasets

Schneider, Florian D.; Jochum, Malte; Le Provost, Gaëtane; Ostrowski, Andreas; Penone, Caterina; Simons, Nadja K.

Trait-based research spans from evolutionary studies of individual-level properties to global patterns of biodiversity and ecosystem functioning. An increasing number of trait data is available for many different organism groups, being published as open access data on a variety of file hosting services. Thus, standardization between datasets is generally lacking due to heterogeneous data formats and types. The compilation of these published data into centralised databases remains a difficult and time-consuming task. We reviewed existing trait databases and online services, as well as initiatives for trait data standardization. Together with data providers and users we identified a need for a minimal trait-data terminology that is flexible enough to include traits from all types of organisms but simple enough to be adopted by different research communities. In order to facilitate reproducibility of analyses, the reuse of data and the combination of datasets from multiple sources, we propose a standardized vocabulary for trait data that is compatible with existing ontologies. We tested the vocabulary using trait datasets from several research groups working on different taxa and questions in a large project (the Biodiversity Exploratories, By relying on unambiguous identifiers, the proposed minimal vocabulary for trait data captures the different degrees of resolution and measurement detail for multiple use cases of trait-based research. It further encourages the use of global Uniform Resource Identifiers (URI) for taxa and trait definitions, methods and units, thereby following the standards for a semantic web of scientific data. In addition, we developed an R-based tool to convert any trait dataset into the proposed standard format. The R-package facilitates the upload of own data to hosting services but also simplifies the access to published trait data. It also offers direct access to trait datasets that have been published in the public domain or under creative commons licenses. All these products are available through the Github platform ( with the aim of a continuous collaboration and improvement with the research community. KEYWORDS: traits, standardization, ontology, semantic web, tools, distributed data, R package, Biodiversity Exploratories


Citation style:

Schneider, Florian / Jochum, Malte / Le Provost, Gaëtane / et al: Towards a harmonization of distributed trait datasets. Jena 2018.

