Distance Measures¤
Distance Measures compute a distance metric between two sets of strings.
Intended audience: Linked Data Experts and Domain Experts
Name | Description |
---|---|
CJK reading distance | CJK Reading Distance. |
Compare physical quantities | Computes the distance between two physical quantities. The distance is normalized to the SI base unit of the dimension. For instance for lengths, the distance will be in metres. Comparing incompatible units will yield a validation error. |
Constant similarity value | Always returns a constant similarity value. |
Cosine | Cosine Distance Measure. |
Date | The distance in days between two dates (‘YYYY-MM-DD’ format). |
DateTime | Distance between two date time values (xsd:dateTime format) in seconds. |
Dice coefficient | Dice similarity coefficient. |
Geographical distance | Computes the geographical distance between two points. Author: Konrad Höffner (MOLE subgroup of Research Group AKSW, University of Leipzig) |
Greater than | Checks if the source value is greater than the target value. If both strings are numbers, numerical order is used for comparison. Otherwise, alphanumerical order is used. |
Inequality | Returns success if values are not equal, failure otherwise. |
Inside numeric interval | Checks if a number is contained inside a numeric interval, such as ‘1900 - 2000’. |
Is substring | Checks if a source value is a substring of a target value. |
Jaccard | Jaccard similarity coefficient. Divides the matching tokens by the number of distinct tokens from both inputs. |
Jaro distance | Matches strings based on the Jaro distance metric. |
Jaro-Winkler distance | Matches strings based on the Jaro-Winkler distance measure. |
Korean phoneme distance | Korean phoneme distance. |
Korean translit distance | Transliterated Korean distance. |
Levenshtein distance | Levenshtein distance. Returns a distance value between zero and the size of the string. |
Lower than | Checks if the source value is lower than the target value. |
Normalized Levenshtein distance | Normalized Levenshtein distance. Divides the edit distance by the length of the longer string. |
Numeric equality | Compares values numerically instead of their string representation as the ‘String Equality’ operator does. Allows to set the needed precision of the comparison. A value of 0.0 means that the values must represent exactly the same (floating point) value, values higher than that allow for a margin of tolerance. |
Numeric similarity | Computes the numeric distance between two numbers. |
qGrams | String similarity based on q-grams (by default q=2). |
Relaxed equality | Return success if strings are equal, failure otherwise. Lower/upper case and differences like ö/o, n/ñ, c/ç etc. are treated as equal. |
Soft Jaccard | Soft Jaccard similarity coefficient. Same as Jaccard distance but values within an levenhstein distance of ‘maxDistance’ are considered equivalent. |
Starts with | Returns success if the first string starts with the second string, failure otherwise. |
String equality | Checks for equality of the string representation of the given values. Returns success if string values are equal, failure otherwise. For a numeric comparison of values use the ‘Numeric Equality’ comparator. |
Substring comparison | Return 0 to 1 for strong similarity to weak similarity. Based on the paper: Stoilos, Giorgos, Giorgos Stamou, and Stefanos Kollias. “A string metric for ontology alignment.” The Semantic Web-ISWC 2005. Springer Berlin Heidelberg, 2005. 624-637. |
Token-wise distance | Token-wise string distance using the specified metric. |