A flexible Diversity Workbench tool to publish biodiversity data from SQL database networks through platforms like GFBio

Weiss, Markus; Weibulat, Tanja; Seifert, Stefan; Monje, Juan Carlos; Ruff, Marcel; Neubacher, Dieter; Reichert, Wolfgang; Triebel, Dagmar

The Diversity Workbench (DWB, www.diversityworkbench.net) is a suite of MS SQL databases and data processing tools designed for the management of research data in biology, ecology and geosciences. The software apart from the underlying DBMS is open source. The complete software package is freely available. The DWB with ten data domain-specific databases, one generic database and some independent data processing tools is the result of 20 years software development. The DWB is suitable for single researchers, research groups of any size but also appropriate to set up networks for long-term data repositories and data centers. To operate automated transfer of bio- and geodiversity data for publication from those in-house networks, a new DWB tool was implemented. The core functions include the filtering and transformation of data and metadata from selected in-house data collections stored in productive master SQL databases. The tool is designed for use by database administrators and scientific data curators. It fulfils three major steps for each single data collection: • term, taxa and metadata assignment with parallel data export and creation of a not publicly accessible first-level cache MS SQL database independent from the DWB master database network; unifying data from data domain-specific DWB source databases inside of a (institutional) firewall • re-organisation of data, filtering according later data package assignment and creation of a second-level publicly accessible PostgreSQL database • creation of a publicly accessible data package properly formatted for data harvesting tools of web portals and for data mapping and provision software like the BIOCASe Provider Software. With this DWB tool for guiding data publication some major challenges in bio- and geodiversity research are addressed: a) The data filtering, transformation and publication can be done periodically and is realized without data change and information loss in the linked in-house master databases which might be curated in the long-run. b) A data expert and data scientist is able to handle this transformation tool and organize data publication with minor involvement of a database administrator. c) The data packages are configured for publication according to the individual requests of data producers who often ask for anonymization of certain persons, intend to withhold single data units, set embargos and have to blur geographic coordinates. d) With the automated data transfer for publication, the tool guarantees a reproducible path from the original source to the presentation on a platform. The data centers SMNS and SNSB are using the tool to guide their data publication through the GBIF global biodiversity data network (https://www.gbif.org/) and through the GFBio platform (https://www.gfbio.org/). Furthermore, it is used to create specifically formatted, publicly accessible cache databases with filtered and aggregated content for thematically focused information portals like the Botanischer Informationsknoten Bayern (http://daten.bayernflora.de/de/index.php). Summarizing, the new DWB tool is able to support a wide range of data transfer and transformation tasks for data publication from DWB networks. It is included in the published software versions of DiversityCollection and described in its manual. Future planning includes the extension of the tool to involve the data filtering and transformation from DiversityDescriptions as a generic DWB source database. KEYWORDS: automated data transfer, biodiversity data, data filtering, data publication, GFBio data centers


Weiss, Markus / Weibulat, Tanja / Seifert, Stefan / et al: A flexible Diversity Workbench tool to publish biodiversity data from SQL database networks through platforms like GFBio. Jena 2018.

