Towards Semantic Data Management in LifeWatch Italy: the Phytoplankton Study Case
LifeWatch Italy, the Italian node of LifeWatch-ERIC, has promoted and stimulated the debate on the use of semantic in the biodiversity data management. Actually, information from biodiversity and ecosystems is very heterogeneous and needs to be better managed in order to improve the actual scientific knowledge, as well as to address the urgent societal challenges concerning environmental issues. Here we present the Phytoplankton Study Case, where the semantic approach was used to address data harmonisation, integration and discovery. An interdisciplinary team of LifeWatch Italy has developed a thesaurus on phytoplankton functional traits and linked its concepts to other existing conceptual schema related to the specific domain. In the meantime, the team has produced the LifeWatch Core Ontology, a customization of the OBOE core ontology, for the semantic description/capture of basic concepts and relationships in ecological studies. This framework ontology is based on 7 main concepts (classes) as Domain, Entity, Observation, Characteristic, Measurement, Protocol, Standard, providing a structured yet generic approach for semantic data annotation, and for developing domain-specific ecological ontologies as the Phytoplankton Trait Ontology (PhyTO). To date, LifeWatch e-Infrastructure stores and manages data and metadata using an mix of Database Management Systems (the Relational MySQL and the NoSQL MONGO DB); for the purpose of the study case, we selected the VIRTUOSO Triple Store as semantic repository and we developed different modules to automate the management workflow. A first software module has been developed to allow the data annotation with classes, subclasses and properties of the PhyTO (i.e. Semantic Annotation). The designed module allows to map metadata and data stored in the LifeWatch Data Portal with the OWL schema of the PhyTO and to produce .rdf output files. A second developed module uses as input the .rdf files and store the data in the VIRTUOSO Graph to make them available for the semantic search. Moreover, a user-friendly search interface (i.e. Java Portlet) has been implemented to retrieve annotated data with queries suggested by the data users. This approach facilitates data discovery and integration, and can provide guidance for, and automate, data aggregation and summary.