Identification and analysis of patterns in DNA sequences, the genetic code and transcriptional gene regulation
The present cumulative work consists of six articles linked by the topic ”Identification and Analysis of Patterns in DNA sequences, the Genetic Code and Transcriptional Gene Regulation”. We have applied a binary coding, to efficiently findpatterns within nucleotide sequences. In the first and second part of my work one single bit to encode all four nucleotides is used. The three possibilities of a one - bit coding are: keto (G,U) - amino (A,C) bases, strong (G,C) - weak (A,U) bases, and purines (G,A) - pyrimidines (C,U). We found out that the best pattern could be observed using the purine - pyrimidine coding. Applying this coding we have succeeded in finding a new representation of the genetic code which has been published under the title ”A New Classification Scheme of the Genetic Code” in ”Journal of Molecular Biology” and ”A Purine-Pyrimidine Classification Scheme of the Genetic Code” in ”BIOForum Europe”. This new representation enables to reduce the common table of the genetic code from 64 to 32 fields maintaining the same information content. It turned out that all known and even new patterns of the genetic code can easily be recognized in this new scheme. Furthermore, our new representation allows us for speculations about the origin and evolution of the translation machinery and the genetic code. Thus, we found a possible explanation for the contemporary codon - amino acid assignment and wide support for an early doublet code. Those explanations have been published in ”Journal of Bioinformatics and Computational Biology” under the title ”The New Classification Scheme of the Genetic Code, its Early Evolution, and tRNA Usage”. Assuming to find these purine - pyrimidine patterns at the DNA level itself, we examined DNA binding sites for the occurrence of binary patterns. A comprehensive statistic about the largest class of restriction enzymes (type II) has shown a very distinctive purine - pyrimidine pattern. Moreover, we have observed a higher G+C content for the protein binding sequences. For both observations we have provided and discussed several explanations published under the title ”Common Patterns in Type II Restriction Enzyme Binding Sites” in ”Nucleic Acid Research”. The identified patterns may help to understand how a protein finds its binding site. In the last part of my work two submitted articles about the analysis of Boolean functions are presented. Boolean functions are used for the description and analysis of complex dynamic processes and make it easier to find binary patterns within biochemical interaction networks. It is well known that not all functions are necessary to describe biologically relevant gene interaction networks. In the article entitled ”Boolean Networks with Biologically Relevant Rules Show Ordered Behavior”, submitted to ”BioSystems”, we have shown, that the class of required Boolean functions can strongly be restricted. Furthermore, we calculated the exact number of hierarchically canalizing functions which are known to be biologically relevant. In our work ”The Decomposition Tree for Analysis of Boolean Functions” submitted to ”Journal of Complexity”, we introduced an efficient data structure for the classification and analysis of Boolean functions. This permits the recognition of biologically relevant Boolean functions in polynomial time.