Navigating the long tail - Towards practical guidance for researchers on how to select a repository for long tail data
With nearly 2000 entries in the Registry of Research Data Repositories (re3data.org, November 2017) researchers are confronted with a plethora of repositories to deposit research data. Given the diversity of these services, we have noticed that researchers find it challenging to make an informed decision, especially when they are dealing with data from the so-called “long tail” (small, diverse, individual, less standardized data). Although, re3data.org provides a very comprehensive list of criteria (i.e. filters) to narrow down the number of choices, there is still advice needed, for example, on evaluating the importance of a criteria (e.g. type of repository) or the impact of a certain choice (e.g. which PID?). In this poster presentation, we take the perspective of the research data management helpdesk, a central service facility at the Friedrich-Schiller University in Jena (Germany), and investigate how we could address this selection challenge. The aim is to develop a practical guide for researchers from domains where there is no obvious choice or well-established repository available (i.e. the long tail) and where researchers rely on general-purpose repositories. In a first step, we compared five generic repositories for long tail data (Figshare, Zenodo, Dryad, RADAR, Digital Library Thuringia) using the individual descriptions and properties on re3data.org. For some criteria, the information content in re3data.org was rather limited, so we also explore the individual websites of the repository providers. For example, the criteria “Quality Management” only says whether a repository provider does quality management, but not what exactly that means. Another example for rather sparse information is the level of data curation available and applied to the data in a certain repository. Such information would be helpful in the evaluation process. In a second step, we took a number of real cases from our work at the helpdesk and investigated the matching between the researcher’s intentions and expectations with the means and information available to evaluate a repository (both, at re3data.org, repository website). This might be straightforward, for example, if the intention is to make data citable, where one needs to check whether a PID is provided. But it might more difficult, for example, if a researcher would like to assess the visibility a dataset may gain from publishing with a certain repository. In this case, one should look at a number of properties (e.g. Metrics, Sydications, API, Licences) with rather technical information.