Normalized Levenshtein distance¤
Normalized Levenshtein distance. Divides the edit distance by the length of the longer string.
Characteristics¤
This distance measure is normalized, i.e., all distances are between 0 (exact match) and 1 (no similarity).
Compares single values (as opposed to sequences of values). If multiple values are provided, all values are compared and the lowest distance is returned.
Examples¤
Notation: List of values are represented via square brackets. Example: [first, second]
represents a list of two values “first” and “second”.
Returns 0 for equal strings:
-
Input values:
- Source:
[John]
- Target:
[John]
- Source:
-
Returns:
0.0
Returns 1/4 if two strings of length 4 differ by one edit operation:
-
Input values:
- Source:
[John]
- Target:
[Jxhn]
- Source:
-
Returns:
0.25
Normalizes the edit distance by the length of the longer string:
-
Input values:
- Source:
[John]
- Target:
[Jhn]
- Source:
-
Returns:
0.25
Returns the maximum distance of 1 for completely different strings:
-
Input values:
- Source:
[John]
- Target:
[Clara]
- Source:
-
Returns:
1.0
Parameter¤
Q-grams size¤
The size of the q-grams to be indexed. Setting this to zero will disable indexing.
- Datatype:
int
- Default Value:
2
Min char¤
The minimum character that is used for indexing
- Datatype:
char
- Default Value:
0
Max char¤
The maximum character that is used for indexing
- Datatype:
char
- Default Value:
z