String Edit Distance Matrices for Various Datasets
A number of string edit distance matrices are made available on this site. In a first step, patterns from various datasets have been converted into strings. Next, a complete string matching has been conducted, resulting in an edit distance matrix per configuration and dataset. The edit distance is a distance measure that reflects the structural dissimilarity of strings, such that low distance corresponds to similar strings and high distance to dissimilar strings. The edit distance data can be used for evaluating new classification methods and clustering procedures in structural pattern recognition.
The string datasets and the edit distance matrices have been prepared and computed between 2004 and 2005 by Barbara Spillmann and Michel Neuhaus.
Download the documentation Description of the Distance Matrices (PDF)
- File Format of the Distance Matrices Files (.dm files)
- Chicken Pieces Silhouettes Database
- Copenhagen Chromosome Database
- Toolset Database
- Pen-Based Recognition of Handwritten Digits (Original, unnormalized version)
- Sea Animal Database
- Folder Structure
To download the distance matrices, right click, "Save link as..."
- Chicken pieces dataset (25M)
- Copenhagen chromosome dataset (97M)
- Pendigits angle dataset (1.9G)
- Pendigits vector dataset (531M)
- Rutgers university tool dataset (270K)
- Sea Animals dataset (75M)