Image-based classification of plant genus and family for trained and untrained plant species
Background: Modern plant taxonomy reflects phylogenetic relationships among taxa based on proposed morphological and genetic similarities. However, taxonomical relation is not necessarily reflected by close overall resemblance, but rather by commonality of very specific morphological characters or similarity on the molecular level. It is an open research question to which extent phylogenetic relations within higher taxonomic levels such as genera and families are reflected by shared visual characters of the constituting species. As a consequence, it is even more questionable whether the taxonomy of plants at these levels can be identified from images using machine learning techniques. Results: Whereas previous studies on automated plant identification from images focused on the species level, we investigated classification at higher taxonomic levels such as genera and families. We used images of 1000 plant species that are representative for the flora of Western Europe. We tested how accurate a visual representation of genera and families can be learned from images of their species in order to identify the taxonomy of species included in and excluded from learning. Using natural images with random content, roughly 500 images per species are required for accurate classification. The classification accuracy for 1000 species amounts to 82.2% and increases to 85.9% and 88.4% on genus and family level. Classifying species excluded from training, the accuracy significantly reduces to 38.3% and 38.7% on genus and family level. Excluded species of well represented genera and families can be classified with 67.8% and 52.8% accuracy. Conclusion: Our results show that shared visual characters are indeed present at higher taxonomic levels. Most dominantly they are preserved in flowers and leaves, and enable state-of-the-art classification algorithms to learn accurate visual representations of plant genera and families. Given a sufficient amount and composition of training data, we show that this allows for high classification accuracy increasing with the taxonomic level and even facilitating the taxonomic identification of species excluded from the training process.