Washington Database

Data Set

The Washington database was created from the George Washington Papers at the Library of Congress and has the following characteristics:

  • 18th century
  • English language
  • two writers
  • longhand script
  • ink on paper

The original manuscript images [4] have already been used, for example, by Rath and Manmatha in [3]. The Washington database contains our own text line and word images alongside with their transcription. Altogether, the manuscript data is given by:

  • binarized and normalized text line images
  • binarized and normalized word images

The ground truth contains:

  • transcription at line-level
  • transcription at word-level

Statistics

The Washington database includes:

  • 20 pages
  • 656 text lines
  • 4,894 word instances
  • 1,471 word classes
  • 82 letters

Download

If not already done, we ask you to register before downloading the database. Once registered, you can download the Washington database here:

The archive contains a README file with detailed information about the data formats used. We also provide the training, validation, and test set IDs that were used, for example, in [1] and [2].

Terms of Use

The Washington database may be used for non-commercial research and teaching purposes only. If you are publishing scientific work based on the Washington database, we request you to include a reference to our paper [1] A. Fischer, A. Keller, V. Frinken, and H. Bunke: "Lexicon-Free Handwritten Word Spotting Using Character HMMs," in Pattern Recognition Letters, Volume 33(7), pages 934-942, 2012.

References

Printed versions of the papers are linked by DOI. Additionally, we provide accepted preprint versions as PDFs. The preprints are intended for convenient online browsing only.

[1] A. Fischer, A. Keller, V. Frinken, and H. Bunke: "Lexicon-Free Handwritten Word Spotting Using Character HMMs," in Pattern Recognition Letters, Volume 33(7), pages 934-942, 2012. [doi] [pdf]

[2] V. Frinken, A. Fischer, R. Manmatha, and H. Bunke: "A Novel Word Spotting Method Based on Recurrent Neural Networks," in IEEE Trans. PAMI, Volume 34(2), pages 211-224, 2012. [doi] [pdf]

[3] T. M. Rath and R. Manmatha: "Word Spotting for Historical Documents," in Int. Journal on Document Analysis and Recognition, Volume 9, pages 139-152, 2007.

[4] George Washington Papers at the Library of Congress from 1741-1799, Series 2, Letterbook 1, pages 270-279 and 300-309

 

Document Actions