Pattern recognition methods for the prediction of chemical structures of fungal secondary metabolites

Non-Ribosomal Peptide Synthetases (NRPS) are mega synthetases that are predominantly found in bacteria and fungi. They produce small peptides that serve numerous biological functions and crucial ecological roles. Adenylation (A) domains of NRPSs catalyze ATP dependent activation of substrates harboring carboxy terminus. A-domain substrates include not only natural amino acids (D and L forms) but also non-proteinogenic amino acids. As the substrate repertoire is large and specificity rules for fungi are not established well, there is a difficulty in predicting substrates for fungal A-domains. In bacteria, ten amino acid residues were established as NRPS code, which determine specificity of A-domains. To study relationships between fungal A-domains and their specificity, the cluster analysis of NRPS code residues was done. NRPS code residues were encoded by physicochemical properties essential for binding small molecules and these residues were clustered. Cluster analysis showed similar NRPS codes for α-amino adipic acid, and tryptophan, etc. between bacteria and fungi. Fungal NRPS codes for substrates such as tyrosine, and proline, did not cluster together with bacteria, which indicates an independent evolution of substrate specificity in fungi. This emphasizes the need for the development of a fungus-specific prediction tool. Currently available A-domain substrate specificity prediction tools accurately identify substrates for bacteria but fail to provide correct predictions for fungi. A novel approach for fungal A-domain substrate specificity prediction is presented here. Neural Network based A-domain substrate specificity classifier (NNassc) was developed using Keras with TensorFlow backend. NNassc was trained solely using fungal NRPS codes and combines physicochemical and structural features for specificity predictions. Internal and external validation datasets of experimentally verified NRPS codes were used to assess the performance of NNassc.

Vorschau

Zitieren

Zitierform:
Zitierform konnte nicht geladen werden.

Rechte

Nutzung und Vervielfältigung: