Skip to content

Normalized Levenshtein distance¤

Normalized Levenshtein distance. Divides the edit distance by the length of the longer string.

Characteristics¤

This distance measure is normalized, i.e., all distances are between 0 (exact match) and 1 (no similarity).

Compares single values (as opposed to sequences of values). If multiple values are provided, all values are compared and the lowest distance is returned.

Examples¤

Notation: List of values are represented via square brackets. Example: [first, second] represents a list of two values “first” and “second”.


Returns 0 for equal strings:

  • Input values:

    • Source: [John]
    • Target: [John]
  • Returns: 0.0


Returns 1/4 if two strings of length 4 differ by one edit operation:

  • Input values:

    • Source: [John]
    • Target: [Jxhn]
  • Returns: 0.25


Normalizes the edit distance by the length of the longer string:

  • Input values:

    • Source: [John]
    • Target: [Jhn]
  • Returns: 0.25


Returns the maximum distance of 1 for completely different strings:

  • Input values:

    • Source: [John]
    • Target: [Clara]
  • Returns: 1.0

Parameter¤

Q-grams size¤

The size of the q-grams to be indexed. Setting this to zero will disable indexing.

  • Datatype: int
  • Default Value: 2

Min char¤

The minimum character that is used for indexing

  • Datatype: char
  • Default Value: 0

Max char¤

The maximum character that is used for indexing

  • Datatype: char
  • Default Value: z

Comments