The Document class is one of the four core classes in the sfutils package. A Document is a (large) body of text.
Document(text, ...)
text | text to be fingerprinted |
---|---|
... | other options to be passed (uuid, fingerprint) |
(From http://documentation.cortical.io/working_with_text.html) The functionality we offer for text is a little more elaborate than for terms, given the more complex nature of texts. Besides getting a semantic fingerprint (semantic representation) for a given text (the /text endpoint), one can also get a list of keywords extracted from the text, or get the text split up into smaller consecutive chunks, based on information content. We also provide functionality for extracting terms from a text based on part of speech tags. There is also a bulk endpoint for merging several /text requests into just one http request. Finally there is a detect_language endpoint capable of detecting 50 languages.
text
text to be fingerprinted
fingerprint
numeric vector of the fingerprint
See the Cortical documentation for more information about semantic fingerprinting and text
# NOT RUN { # Get data data("company_descriptions") # Get a single text txt <- company_descriptions$unilever$desc # Fingerprint document txt_fp <- do_fingerprint_document(txt) # This is equivalent to above but above is more convenient # Because it can fingerprint documents in bulk txt_fp <- Document(txt) # You can also pass a fingerprint to the Document constructor # In which case the API won't be called txt_fp_3 <- Document(txt, fingerprint = fingerprint(txt_fp_1)) # }