Jaccard¤
Jaccard similarity coefficient. Divides the matching tokens by the number of distinct tokens from both inputs.
Characteristics¤
This distance measure is normalized, i.e., all distances are between 0 (exact match) and 1 (no similarity).
Compares sets of multiple values.Typically, incoming values are tokenized before being fed into this measure.
Examples¤
Notation: List of values are represented via square brackets. Example: [first, second]
represents a list of two values “first” and “second”.
Returns 0 for equal sets of values:
-
Input values:
- Source:
[A, B, C]
- Target:
[B, C, A]
- Source:
-
Returns:
0.0
Returns 1 if there is no overlap between both sets of tokens:
-
Input values:
- Source:
[A, B, C]
- Target:
[D, E, F]
- Source:
-
Returns:
1.0
Returns 0.5 if half of all unique tokens overlap:
-
Input values:
- Source:
[A, B, C]
- Target:
[A, B, D]
- Source:
-
Returns:
0.5
Returns 2/3 if one third of all unique tokens overlap:
-
Input values:
- Source:
[John, Jane]
- Target:
[John, Max]
- Source:
-
Returns:
0.6666666666666666
Parameter¤
None