Digitisation

We are a world leader in exploring innovative and novel ways of digitising our 3 million herbarium specimens

Research and Development

Digitisation workflow

We have developed an integrated workflow for the digitisation of herbarium specimens which is modular and scalable to enable a single overall workflow to be used for all digitisation projects. Where possible we have incorporate automated systems to enable us to expand and speed up the digitisation process. There are three main elements: a specimen workflow, a data workflow and an image workflow.

  • The specimen workflow involves the selection and preparation of specimens and folders and is closely linked to workflows for [loans], incoming specimens, [destructive sampling] and curation.
  • The image workflow incorporates image capture, processing, image management data recording, optical character recognition (OCR), quality control, image streaming online and archiving. Our automated image processing system allows images from several different imaging methods to be handled via a single system, based around a dropbox folder structure. The folders are ordered in a structured hierarchy and this provides basic image management data including the equipment and operator’s name. This information is written to the image management database creating the metadata for each file automatically. The system also creates the necessary derivatives for serving the images to the web, links the images to the database record and sends a copy of the file to our OCR workflow.

The data workflow includes all elements of capturing and managing data associated with specimens. This databasing process is primarily focussed on the curatorial data – the geographic filing area and filing name, which is shared by all specimens within a folder. A form in the data management software allows this information to be entered once and multiple records created by scanning specimen barcodes. These minimally databased records can be enhanced using various methods including the use of OCR and citizen science projects.

Optical Character Recognition

Optical Character Recognition (OCR) has been part of our image workflow since 2012, and all specimen images are run through this process. We continue to find seek inventive ways to use our OCR output to enhance our digitisation workflow.

Preparation of specimens for manual and semi-automatic data entry

Currently we use OCR to sort specimens prior to databasing (i.e. by collector and country) or to enhance records that have been minimally databased.

Enhancement of Quality Control procedures

More recently we have explored the potential of OCR in our Quality Control processes. OCR records open up the possibility to check that the barcode used in each image filename matches the one read by the OCR software from the specimen image thus helping us to correct camera operator errors.

Citizen Science

We have been exploring several different Citizen Science platforms to help with the transcription of our specimens. These include Herbaria@Home, DigiVol, and Notes from Nature. We are also in the process of developing internal Citizen Science projects, one for a basic sort of specimens and one for the transcription of label data.

Stable URLs

Linking Collections

Developing catalogue to include links to the living collection and images of the plant in the field.

Publications

Haston, E.M.; Cubey, R.W.N.; Pullan, M.; Atkins, H. & Harris, D.J. (2012). Developing integrated workflows for the digitisation of herbarium specimens using a modular and scalable approach. Zookeys, 209: 93-102. DOI: 10.3897/zookeys.209.3121

Haston, E.M. & Cubey, R.W.N. & Harris, D.J. (2012). Data concepts and their relevance for data capture in large scale digitisation of biological collections. International Journal of Humanities and Arts Computing, 6(1-2): 111-119. DOI: 10.3366/ijhac.2012.004

Drinkwater, R. E.; Cubey, R.W.N. & Haston, E.M. (2014). The use of Optical Character Recognition (OCR) in the digitisation of herbarium specimen labels. Phytokeys 38: 15-30. DOI: 38.7168

  • Contact the Herbarium

    If you have any queries relating to the Herbarium, please get in touch using the form

    Get in touch