Technology

Optical Character Recognition (OCR)

Optical Character Recognition is a process by which software reads a page image and translates it into a text file by recognising the shapes of the letters. All the text in this collection has been automatically generated using OCR software. It has not been manually reviewed or corrected. To look at the OCR text for a report, click on the "View computer-generated text" link in the Report.

OCR mistakes

OCR enables searching of large quantities of full text data, but it is never 100% accurate. For the AtoJs Online project, the level of accuracy depends on the print quality of the original volume, its condition at the time of digitisation, and the level of detail captured by the scanner. Volumes with poor quality paper, small print, mixed fonts, multiple column layouts, or damaged pages may have poor OCR accuracy. This means that most pages will have some errors in the computer-generated text, and some will have a lot of errors.

Veridian Digital Library Software

Veridian is computer software for making digital collections available in full-text searchable form over the Internet. It is designed specifically to support collections of digitised printed materials (e.g. newspapers, books, and journals), and to take advantage of the latest technologies used in large digitisation projects.

Veridian was developed in New Zealand by DL Consulting Ltd, using the Greenstone digital library software. Greenstone was developed by the New Zealand Digital Library Project at the University of Waikato, and is distributed in cooperation with UNESCO and the Human Info NGO.

Veridian homepage

Greenstone homepage