Background Deep learning algorithms for automated plant identification need large quantities of precisely labelled images in order to produce reliable classification results. Here, we explore what kind of perspectives and their combinations contain more characteristic information and therefore allow for higher identification accuracy. Results We developed an image-capturing scheme to create observations of flowering plants. Each observation comprises five in-situ images of the same individual from predefined perspectives (entire plant, flower frontal- and lateral view, leaf top- and back side view). We collected a completely balanced dataset comprising 100 observations for each of 101 species with an emphasis on groups of conspecific and visually similar species including twelve Poaceae species. We used this dataset to train convolutional neural networks and determine the prediction accuracy for each single perspective and their combinations via score level fusion. Top-1 accuracies ranged between 77% (entire plant) and 97% (fusion of all perspectives) when averaged across species. Flower frontal view achieved the highest accuracy (88%). Fusing flower frontal, flower lateral and leaf top views yields the most reasonable compromise with respect to acquisition effort and accuracy (96%). The perspective achieving the highest accuracy was species dependent. Conclusions We argue that image databases of herbaceous plants would benefit from multi organ observations, comprising at least the front and lateral perspective of flowers and the leaf top view.