15 Dokumente gefunden

MARCUS: molecular annotation and recognition for curating unravelled structures

The exponential growth of chemical literature necessitates the development of automated tools for extracting and curating molecular information from unstructured scientific publications into open-access chemical databases. Current optical chemical structure recognition (OCSR) and named entity recognition…
Cambridge: Royal Society of Chemistry, 2025-10-07

Cheminformatics Microservice V3: a web portal for chemical structure manipulation and analysis

Abstract The widespread adoption of open-source cheminformatics toolkits remains constrained by technical implementation barriers, including complex installation procedures, dependency management, and integration challenges. Here, we present Cheminformatics Microservice V3 , a significant update to the…
London: BioMed Central, 2025-09-23

Cheminformatics Microservice: unifying access to open cheminformatics toolkits

Abstract In recent years, cheminformatics has experienced significant advancements through the development of new open-source software tools based on various cheminformatics programming toolkits. However, adopting these toolkits presents challenges, including proper installation, setup, deployment, and…
London: BioMed Central, 2023-10-16

DECIMER.ai: an open platform for automated optical chemical structure identification, segmentation and recognition in scientific…

Abstract The number of publications describing chemical structures has increased steadily over the last decades. However, the majority of published chemical information is currently not available in machine-readable form in public databases. It remains a challenge to automate the process of information…
[London]: Springer Nature, 2023-08-19

Open data and algorithms for open science in AI-driven molecular informatics

Recent years have seen a sharp increase in the development of deep learning and artificial intelligence-based molecular informatics. There has been a growing interest in applying deep learning to several subfields, including the digital transformation of synthetic chemistry, extraction of chemical information…
Amsterdam [u.a.]: Elsevier BV, 2023-04

DECIMER—hand-drawn molecule images dataset

Abstract The translation of images of chemical structures into machine-readable representations of the depicted molecules is known as optical chemical structure recognition (OCSR). There has been a lot of progress over the last three decades in this field, but the development of systems for the recognition…
London: BioMed Central, 2022-06-09

RanDepict: Random chemical structure depiction generator

The development of deep learning-based optical chemical structure recognition (OCSR) systems has led to a need for datasets of chemical structure depictions. The diversity of the features in the training data is an important factor for the generation of deep learning systems that generalise well and…
London: BioMed Central, 2022-06-06

DECIMER 1.0: deep learning for chemical image recognition using transformers

The amount of data available on chemical structures and their properties has increased steadily over the past decades. In particular, articles published before the mid-1990 are available only in printed or scanned form. The extraction and storage of data from those articles in a publicly accessible database…
London: BioMed Central, 2021-08-17

STOUT: SMILES to IUPAC names using neural machine translation

Chemical compounds can be identified through a graphical depiction, a suitable string representation, or a chemical name. A universally accepted naming scheme for chemistry was established by the International Union of Pure and Applied Chemistry (IUPAC) based on a set of rules. Due to the complexity…
London: BioMed Central, 2021-04-27

DECIMER-Segmentation: Automated extraction of chemical structure depictions from scientific literature

Chemistry looks back at many decades of publications on chemical compounds, their structures and properties, in scientific articles. Liberating this knowledge (semi-)automatically and making it available to the world in open-access databases is a current challenge. Apart from mining textual information,…
London: BioMed Central, 2021-03-08

Molecule Set Comparator (MSC): : a CDK-based open rich‐client tool for molecule set similarity evaluations

The open rich-client Molecule Set Comparator (MSC) application enables a versatile and fast comparison of large molecule sets with a unique inter-set molecule-to-molecule mapping obtained e.g. by molecular-recognition-oriented machine learning approaches. The molecule-to-molecule comparison is based…
London: BioMed Central, 2021-02-01

COCONUT online: Collection of Open Natural Products database

Natural products (NPs) are small molecules produced by living organisms with potential applications in pharmacology and other industries as many of them are bioactive. This potential raised great interest in NP research around the world and in different application fields, therefore, over the years a…
London: BioMed Central, 2021-01-10

DARLING : Deep leARning for chemicaL Information processinG

Vast quantities of scientific information are hidden in primary scientific publications and not available as curated data in scientific databases. Making such information publicly available to support open science and open innovation is a challenge that has to be solved. In this dissertation, state-of-the-art…

DECIMER : towards deep learning for chemical image recognition

The automatic recognition of chemical structure diagrams from the literature is an indispensable component of workflows to re-discover information about chemicals and to make it available in open-access databases. Here we report preliminary findings in our development of Deep lEarning for Chemical ImagE…
London: BioMed Central, 2020-10-27

A review of optical chemical structure recognition tools

Structural information about chemical compounds is typically conveyed as 2D images of molecular structures in scientific documents. Unfortunately, these depictions are not a machine-readable representation of the molecules. With a backlog of decades of chemical literature in printed form not properly…
London: BioMed Central, 2020-10-07