690 Fingerprinted documents (fps_train) and 189 fingerprinted documents (fps_test) belonging to nine categories taken from the 'reuters21578' dataset in the 'tm.corpus.Reuters21578' package. The data has been processed such that a) only articles are considered that belong to only one class, b) only articles belonging to one of the following topics are considered: grain, corn, crude, livestock, wheat, coffee, sugar, gold, copper, cocoa. The labels_binomial variable is created by recoding the topics as 'crude' if the topic belongs to 'crude' and 'other' if the topic belongs to one of the other classes.

fps_train

fps_test

Format

A list with three entries:

  1. A vector called 'label_binomial' with two class labels: 'crude' and 'other'

  2. A vector called 'label_multinomial' with nine, original class labels

  3. A list of 690 (train) or 189 (test) fingerprinted documents of S4 class 'Document'

    text

    original document

    fingerprint

    fingerprint of document

    uuid

    unique id

    type

    type of the document that was fingerprinted

Source

https://archive.ics.uci.edu/ml/datasets/Reuters-21578+Text+Categorization+Collection