R/generics.R
, R/methods_fingerprint.R
, R/methods_other.R
do_compare-methods.Rd
Calculate a variety of similarity and distance metrics
do_compare(x, y, method = c("cosine", "jaccard", "dice", "gilbertwells", "dennis", "sorgenfrei", "lancewilliams", "euclid", "hamming")) # S4 method for Fingerprint do_compare(x, y, method = c("cosine", "jaccard", "dice", "gilbertwells", "dennis", "sorgenfrei", "lancewilliams", "euclid", "hamming")) # S4 method for dgCMatrix do_compare(x, y, method = c("cosine", "jaccard", "dice", "gilbertwells", "dennis", "sorgenfrei", "lancewilliams", "euclid", "hamming"))
x | either an object of class Filter, Expression, Term or Document or an object of class 'cdgMatrix' for which you want to calculate similarities. This matrix can be obtained by calling 'as.matrix()' on a Collection class. |
---|---|
y | reference fingerprint. Can be: a Filter, Expression, Term or Document class |
method | one of the following: "cosine", "jaccard", "dice", "gilbertwells", "dennis", "sorgenfrei" (similarity) or "lancewilliams", "euclid", "hamming" (distance) |
similarity or distance metric between two fingerprints or a matrix of length n of similarity/distance metrics between documents and reference fingerprint
You can compare either a sparse binary matrix obtained by turning a Collection object into a matrix with another fingerprint (a Document, Expression, Term or Filter), or by simply passing two Fingerprint objects.
This paper with similarity metrics
# NOT RUN { # Get data data("company_descriptions") # Put text in a list txt <- lapply(company_descriptions, function(x) x$desc) # Fingerprint documents txt_fp <- do_fingerprint_document(txt) # Fingerprint a term trm_fp <- do_fingerprint_term("finance") # We can compare: # - a document with a document do_compare(txt_fp[[1]], txt_fp[[2]]) # - a term with a document do_compare(txt_fp[[1]], trm_fp) # - an expression with a document # ... anything with a fingerprint # We can also compare a sparse binary matrix # with another fingeprint # Convert the fingerprinted documents to a matrix txt_fp_mat <- as.matrix(txt_fp) # Compare to term do_compare(txt_fp_mat, trm_fp) # }