An Ontology-based Reference Process to Provide Interoperability and Foster Database Integration for Biodiversity Data
Citizen science and scientific and applied research worldwide are continually producing large volumes of biodiversity-related data. This data represents an important source of knowledge, which can be used for several purposes, such as the assessment of climate change impacts on the environment and the definition of public policies and scenarios for sustainable use of the biodiversity. Unfortunately, the data is being collected and stored in different forms, formats and standards. Many researchers organise their data to attend immediate research purposes without taking the time to organise them properly, even if the research community demands the publicity of such data, to enable reproduction, continuity and better evaluation of research contributions. Free and open access data facilities have been created to store and publish discoverable biodiversity-related data, such as GBIF (Global Biodiversity Information Facility) and ALA (Atlas of Living Australia). These facilities usually store the data based on standards such as the Darwin Core, which is used by both GBIF and ALA. If the data is not compliant with the standards adopted by the facility, a specific solution for database integration must be developed. Some of the original data might become useless in the process if a technical solution to store and provide discoverability of non-standardised data is not available, potentially hindering access to information. The data should be standardised from the very early stages of a project, but solutions are also required to standardise the data that have already been collected. A semantic approach can achieve this goal by applying ontologies to improve the understanding of the available data and metadata. Ontologies have been advocated as a powerful technique to provide interoperability among datasets and information systems. This paper presents ProSIt, an ontology-based reference process (workflow) to guide the creation of a semantic approach to provide biodiversity data interoperability based on the semantic integration of biodiversity standards. A functional ontology was built as a case study to evaluate the reference process by providing interoperability between the ABCD and Darwin Core standards, which are the TDWG (Biodiversity Information Standards) currently recommended standards and among the most adopted worldwide. The ontology and reference process were evaluated and proved effective, representing a promising solution for biodiversity-related data interoperability. The reference process can be applied to other existing standards and ad hoc databases that can be semantically interpreted, eliminating the structural barrier that prevents simplified access and availability of information between heterogeneous and distributed databases.