Skip to content

Token-wise distance¤

Token-wise string distance using the specified metric.

Characteristics¤

This distance measure is normalized, i.e., all distances are between 0 (exact match) and 1 (no similarity).

Compares single values (as opposed to sequences of values). If multiple values are provided, all values are compared and the lowest distance is returned.

Parameter¤

Ignore case¤

No description

  • Datatype: boolean
  • Default Value: true

Metric name¤

No description

  • Datatype: string
  • Default Value: levenshtein

Split regex¤

No description

  • Datatype: string
  • Default Value: [\s\d\p{Punct}]+

Stopwords¤

No description

  • Datatype: string
  • Default Value: None

Stopword weight¤

Weight assigned to stopwords

  • Datatype: double
  • Default Value: 0.01

Non stopword weight¤

Weight assigned to non-stopwords

  • Datatype: double
  • Default Value: 0.1

Use incremental idf weights¤

Use incremental IDF weights

  • Datatype: boolean
  • Default Value: false

Match threshold¤

No description

  • Datatype: double
  • Default Value: 0.0

Ordering impact¤

No description

  • Datatype: double
  • Default Value: 0.0

Adjust by token length¤

No description

  • Datatype: boolean
  • Default Value: false

Comments