Extracting Trait Data from Digitized Herbarium Specimens Using Deep Convolutional Networks
Herbarium collections have been the foundation of taxonomical research for centuries and become increasingly important for related fields such as plant ecology or biogeography. Herbaria worldwide are estimated to include c. 400 million specimens, by inclusion of type specimens cover with few exceptions all known plant taxa (c. 350 000 species) and have a temporal dimension that is reached by only few other botanical data sources. Presently, c. 13.5 million digitized herbarium specimens are available online via institutional websites or aggregating websites like GBIF. We used these specimen images in combination with morphological trait data obtained from TRY and the FLOPO knowledge base in order to train deep convolutional networks to recognize these traits as well as phenological states from specimen images. To improve trait recognition, we expanded our approach to include high resolution scans to enable fine grain feature extraction. Furthermore we analyze differences in recognizability of traits depending on trait group (e.g. leaf traits) or higher taxa. Newly mobilized trait data will be used to improve our trait databases. Our approach is described in detail and performance in the recognition of different traits is analyzed and discussed.