CRAN v3.2.3

quanteda · Aug 29, 2022 · 3494a6f · 3494a6f
1 parent d74c253
commit 3494a6f
Show file tree

Hide file tree

Showing 9 changed files with 44 additions and 40 deletions.
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -3,7 +3,7 @@ Version: 3.2.3
 Title: Quantitative Analysis of Textual Data
 Description: A fast, flexible, and comprehensive framework for 
     quantitative text analysis in R.  Provides functionality for corpus management,
-    creating and manipulating tokens and ngrams, exploring keywords in context, 
+    creating and manipulating tokens and n-grams, exploring keywords in context, 
     forming and manipulating sparse matrices
     of documents by features and feature co-occurrences, analyzing keywords, computing feature similarities and
     distances, applying content dictionaries, applying supervised and unsupervised machine learning, 

diff --git a/R/dfm_lookup.R b/R/dfm_lookup.R
@@ -24,7 +24,7 @@
 #' @export
 #' @note If using `dfm_lookup` with dictionaries containing multi-word
 #'   values, matches will only occur if the features themselves are multi-word
-#'   or formed from ngrams. A better way to match dictionary values that include
+#'   or formed from n-grams. A better way to match dictionary values that include
 #'   multi-word patterns is to apply [tokens_lookup()] to the tokens,
 #'   and then construct the dfm.
 #' @keywords dfm

diff --git a/R/fcm.R b/R/fcm.R
@@ -64,11 +64,11 @@
 #'   `is.fcm(x)` returns `TRUE` if and only if its x is an object of
 #'   type [fcm].
 #' @references
-#'   Momtazi, S., Khudanpur, S., & Klakow, D. (2010). "[A comparative study of
+#'   Momtazi, S., Khudanpur, S., & Klakow, D. (2010). "A comparative study of
 #'   word co-occurrence for term clustering in language model-based sentence
-#'   retrieval.](https://aclanthology.org/N10-1046/)" *Human Language
-#'   Technologies: The 2010 Annual Conference of the North American Chapter of
-#'   the ACL*, Los Angeles, California, June 2010, 325-328.
+#'   retrieval. *Human Language Technologies: The 2010 Annual Conference of the
+#'   North American Chapter of the ACL*, Los Angeles, California, June 2010,
+#'   325-328.  https://aclanthology.org/N10-1046/
 #'
 #'   Jurafsky, D. & Martin, J.H. (2018). From *Speech and Language Processing:
 #'   An Introduction to Natural Language Processing, Computational Linguistics,

diff --git a/R/tokens_ngrams.R b/R/tokens_ngrams.R
@@ -1,18 +1,18 @@
-#' Create ngrams and skipgrams from tokens
+#' Create n-grams and skip-grams from tokens
 #'
-#' Create a set of ngrams (tokens in sequence) from already tokenized text
-#' objects, with an optional skip argument to form skipgrams. Both the ngram
+#' Create a set of n-grams (tokens in sequence) from already tokenized text
+#' objects, with an optional skip argument to form skip-grams. Both the n-gram
 #' length and the skip lengths take vectors of arguments to form multiple
 #' lengths or skips in one pass.  Implemented in C++ for efficiency.
-#' @return a tokens object consisting a list of character vectors of ngrams, one
+#' @return a tokens object consisting a list of character vectors of n-grams, one
 #'   list element per text, or a character vector if called on a simple
 #'   character vector
 #' @param x a tokens object, or a character vector, or a list of characters
 #' @param n integer vector specifying the number of elements to be concatenated
-#'   in each ngram.  Each element of this vector will define a \eqn{n} in the
+#'   in each n-gram.  Each element of this vector will define a \eqn{n} in the
 #'   \eqn{n}-gram(s) that are produced.
 #' @param skip integer vector specifying the adjacency skip size for tokens
-#'   forming the ngrams, default is 0 for only immediately neighbouring words.
+#'   forming the n-grams, default is 0 for only immediately neighbouring words.
 #'   For `skipgrams`, `skip` can be a vector of integers, as the
 #'   "classic" approach to forming skip-grams is to set skip = \eqn{k} where
 #'   \eqn{k} is the distance for which \eqn{k} or fewer skips are used to
@@ -24,7 +24,7 @@
 #'   (underscore) character
 #' @details Normally, these functions will be called through
 #'   `[tokens](x, ngrams = , ...)`, but these functions are provided
-#'   in case a user wants to perform lower-level ngram construction on tokenized
+#'   in case a user wants to perform lower-level n-gram construction on tokenized
 #'   texts.
 #' @export
 #' @examples
@@ -116,17 +116,16 @@ tokens_ngrams.tokens <- function(x, n = 2L, skip = 0L, concatenator = "_") {
 
 #' @rdname tokens_ngrams
 #' @details
-#'   [tokens_skipgrams()] is a wrapper to [tokens_ngrams()]
-#'   that requires arguments to be supplied for both `n` and `skip`.
-#'   For \eqn{k}-skip skipgrams, set `skip` to `0:`\eqn{k}, in order
-#'   to conform to the definition of skip-grams found in Guthrie et al (2006): A
-#'   \eqn{k} skip-gram is an ngram which is a superset of all ngrams and each
-#'   \eqn{(k-i)} skipgram until \eqn{(k-i)==0} (which includes 0 skip-grams).
+#'   [tokens_skipgrams()] is a wrapper to [tokens_ngrams()] that requires
+#'   arguments to be supplied for both `n` and `skip`. For \eqn{k}-skip
+#'   skip-grams, set `skip` to `0:`\eqn{k}, in order to conform to the
+#'   definition of skip-grams found in Guthrie et al (2006): A \eqn{k} skip-gram
+#'   is an n-gram which is a superset of all n-grams and each \eqn{(k-i)}
+#'   skip-gram until \eqn{(k-i)==0} (which includes 0 skip-grams).
 #' @export
 #' @references
 #' Guthrie, David, Ben Allison, Wei Liu, Louise Guthrie, and Yorick Wilks. 2006.
-#' "[A Closer
-#' Look at Skip-Gram Modelling](https://aclanthology.org/L06-1210/)."
+#' "A Closer Look at Skip-Gram Modelling." `https://aclanthology.org/L06-1210/`
 #' @importFrom utils combn
 #' @examples
 #' # skipgrams

diff --git a/README.md b/README.md
@@ -6,7 +6,7 @@ data](https://cdn.rawgit.com/quanteda/quanteda/master/images/quanteda_logo.svg)]
 
 [![CRAN
 Version](https://www.r-pkg.org/badges/version/quanteda)](https://CRAN.R-project.org/package=quanteda)
-[![](https://img.shields.io/badge/devel%20version-3.2.2-royalblue.svg)](https://github.com/quanteda/quanteda)
+[![](https://img.shields.io/badge/devel%20version-3.2.3-royalblue.svg)](https://github.com/quanteda/quanteda)
 [![Downloads](https://cranlogs.r-pkg.org/badges/quanteda)](https://CRAN.R-project.org/package=quanteda)
 [![Total
 Downloads](https://cranlogs.r-pkg.org/badges/grand-total/quanteda?color=orange)](https://CRAN.R-project.org/package=quanteda)

diff --git a/inst/WORDLIST b/inst/WORDLIST
@@ -1,3 +1,4 @@
+aclanthology
 ACL
 Biden
 Biden's
@@ -161,6 +162,8 @@ naivebayes
 nfeature
 ngram
 ngrams
+n-gram
+n-grams
 nsentence
 ntoken
 nuls

diff --git a/man/dfm_lookup.Rd b/man/dfm_lookup.Rd
diff --git a/man/fcm.Rd b/man/fcm.Rd
diff --git a/man/tokens_ngrams.Rd b/man/tokens_ngrams.Rd