Unfolding existing Data Publication Practice in Research Data Workflows in the Biological and Environmental Sciences – First Results from a Survey

In recent years, data publication workflows get more and more attention [1,2]. In order to obtain FAIR data [3], reviewers, data curators and other stakeholders have realized that not only the submitted data matter but also the underlying process to create that data within existing research practice. A better understanding of existing data publication practices in research workflows will help service providers such as data repositories (Pangaea [4], ENA [5], GenBank [6]) to support their users with more appropriate services and tools when submitting data, and otherwise, will sustain the role of data repositories in research practice. Such improved coordination will minimize the workload of researchers and data curators and will facilitate the review process of all stakeholders with respect to reproducibility. Furthermore, well-documented data publication workflows may improve data retrieval and finally data reuse in a long run. One obstacle towards comprehensible and properly described research workflows is the fact that data publication workflows in the life sciences are hard to define. Scholars have their very individual disciplinary background, research skills and experiences. In some domains such as biodiversity, scholars work from several weeks to years to collect and analyze often heterogeneous data from various sources, such as collections, environmental or molecular data repositories. Thus, reconstructing their work process after the project is finalized is very difficult if not impossible. However, our goal is to reveal the state of the art on how scholars manage their data in their research practices. We are in the process of setting up a survey whose general structure is organized according to the GFBio Data Lifecycle [7]. The results will allow us to reveal typical data practices workflows that can be used to evaluate the suitability of existing data repository portals, such as GFBio [8]. In our talk, we present the first insights of the survey. KEYWORDS: data publication workflows, data practices, biological and environmental data, green life sciences, biodiversity REFERENCES: 1. Dallmeier-Tiessen, S., Khodiyar, V., Murphy, F., Nurnberger, A., Raymond, L., Whyte, A., 2017. Connecting Data Publication to the Research Workflow: A Preliminary Analysis, International Journal of Digital Curation, 12, https://doi.org/10.2218/ijdc.v12i1.533. 2. González-Beltrán, A., Li, P., Zhao, J., Avila-Garcia, M. S., Roos, M., Thompson, M., van der Horst, E., Kaliyaperumal, R., Luo, R., Lee, T.-L., Lam, T., Edmunds, S.C., Sansone, S.-A., Rocca-Serra, P, 2015. From Peer-Reviewed to Peer-Reproduced in Scholarly Publishing: The Complementary Roles of Data Models and Workflows in Bioinformatics, PLOS ONE 10, 7, pp. 1–20, https://doi.org/10.1371/journal.pone.0127612. 3. Mark D. Wilkinson et al., 2016. The FAIR Guiding Principles for scientific data management and stewardship, Scientific Data 3. https://doi.org/10.1038/sdata.2016.18 4. Pangaea, https://www.pangaea.org 5. ENA, https://www.ebi.ac.uk/ena 6. GenBank, https://www.ncbi.nlm.nih.gov/genbank/ 7. GFBio Data Lifecycle, https://www.gfbio.org/training/materials/data-lifecycle 8. GFBio, https://www.gfbio.org


Citation style:
Could not load citation form.


Use and reproduction: