Calculate a variety of similarity and distance metrics

do_compare(x, y, method = c("cosine", "jaccard", "dice", "gilbertwells",
  "dennis", "sorgenfrei", "lancewilliams", "euclid", "hamming"))

# S4 method for Fingerprint
do_compare(x, y, method = c("cosine", "jaccard",
  "dice", "gilbertwells", "dennis", "sorgenfrei", "lancewilliams", "euclid",
  "hamming"))

# S4 method for dgCMatrix
do_compare(x, y, method = c("cosine", "jaccard", "dice",
  "gilbertwells", "dennis", "sorgenfrei", "lancewilliams", "euclid", "hamming"))

Arguments

x

either an object of class Filter, Expression, Term or Document or an object of class 'cdgMatrix' for which you want to calculate similarities. This matrix can be obtained by calling 'as.matrix()' on a Collection class.

y

reference fingerprint. Can be: a Filter, Expression, Term or Document class

method

one of the following: "cosine", "jaccard", "dice", "gilbertwells", "dennis", "sorgenfrei" (similarity) or "lancewilliams", "euclid", "hamming" (distance)

Value

similarity or distance metric between two fingerprints or a matrix of length n of similarity/distance metrics between documents and reference fingerprint

Details

You can compare either a sparse binary matrix obtained by turning a Collection object into a matrix with another fingerprint (a Document, Expression, Term or Filter), or by simply passing two Fingerprint objects.

See also

This paper with similarity metrics

Examples

# NOT RUN {
# Get data
data("company_descriptions")

# Put text in a list
txt <- lapply(company_descriptions, function(x) x$desc)

# Fingerprint documents
txt_fp <- do_fingerprint_document(txt)

# Fingerprint a term
trm_fp <- do_fingerprint_term("finance")

# We can compare:
#  - a document with a document
do_compare(txt_fp[[1]], txt_fp[[2]])
#  - a term with a document
do_compare(txt_fp[[1]], trm_fp)
#  - an expression with a document
#  ... anything with a fingerprint

# We can also compare a sparse binary matrix
# with another fingeprint

# Convert the fingerprinted documents to a matrix
txt_fp_mat <- as.matrix(txt_fp)
# Compare to term
do_compare(txt_fp_mat, trm_fp)
# }