removePunctuation Alogicalvalueindicatingwhether punctuationcharacters
should be removed from doc, a custom function which performs punctua-
tion removal, or a list of arguments forremovePunctuation. Defaults to
removeNumbers A logicalvalueindicatingwhether numbersshouldberemoved
from doc or a custom function for number removal. Defaults to FALSE.
stopwords Either a Boolean value indicating stopword removal using default
language speciﬁc stopword lists shipped withthis package, acharacter vec-
tor holding custom stopwords, or a custom function for stopword removal.
Defaults to FALSE.
stemming Either a Booleanvalueindicatingwhether tokens shouldbe stemmed
or a custom stemming function. Defaults to FALSE.
Finally, following options are processed in the given order.
dictionary A character vector to be tabulated against. No other terms will be
listed in the result. Defaults to NULL which means that all terms in doc are
bounds A list with a tag localwhose value must be aninteger vector of length
2. Terms thatappear lessoftenindoc thanthe lower boundbounds$local
or more often than the upper bound bounds$local are discarded. De-
faults to list(local = c(1, Inf)) (i.e., every token will be used).
wordLengths An integer vector of length 2. Words shorter than the minimum
word length wordLengths or longer than the maximum word length
wordLengths are discarded. Defaults to c(3, Inf), i.e., a minimum
word length of 3 characters.
Anamed integer vector of class term_frequency with term frequencies as values and tokens as
strsplit_space_tokenizer <- function(x)
ctrl <- list(tokenize = strsplit_space_tokenizer,
removePunctuation = list(preserve_intra_word_dashes = TRUE),
stopwords = c("reuter", "that"),
stemming = TRUE,
wordLengths = c(4, Inf))
termFreq(crude[], control = ctrl)