Skip to content

Commit

Permalink
CRAN v3.2.3
Browse files Browse the repository at this point in the history
  • Loading branch information
Kenneth Benoit authored and Kenneth Benoit committed Aug 29, 2022
1 parent d74c253 commit 3494a6f
Show file tree
Hide file tree
Showing 9 changed files with 44 additions and 40 deletions.
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ Version: 3.2.3
Title: Quantitative Analysis of Textual Data
Description: A fast, flexible, and comprehensive framework for
quantitative text analysis in R. Provides functionality for corpus management,
creating and manipulating tokens and ngrams, exploring keywords in context,
creating and manipulating tokens and n-grams, exploring keywords in context,
forming and manipulating sparse matrices
of documents by features and feature co-occurrences, analyzing keywords, computing feature similarities and
distances, applying content dictionaries, applying supervised and unsupervised machine learning,
Expand Down
2 changes: 1 addition & 1 deletion R/dfm_lookup.R
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@
#' @export
#' @note If using `dfm_lookup` with dictionaries containing multi-word
#' values, matches will only occur if the features themselves are multi-word
#' or formed from ngrams. A better way to match dictionary values that include
#' or formed from n-grams. A better way to match dictionary values that include
#' multi-word patterns is to apply [tokens_lookup()] to the tokens,
#' and then construct the dfm.
#' @keywords dfm
Expand Down
8 changes: 4 additions & 4 deletions R/fcm.R
Original file line number Diff line number Diff line change
Expand Up @@ -64,11 +64,11 @@
#' `is.fcm(x)` returns `TRUE` if and only if its x is an object of
#' type [fcm].
#' @references
#' Momtazi, S., Khudanpur, S., & Klakow, D. (2010). "[A comparative study of
#' Momtazi, S., Khudanpur, S., & Klakow, D. (2010). "A comparative study of
#' word co-occurrence for term clustering in language model-based sentence
#' retrieval.](https://aclanthology.org/N10-1046/)" *Human Language
#' Technologies: The 2010 Annual Conference of the North American Chapter of
#' the ACL*, Los Angeles, California, June 2010, 325-328.
#' retrieval. *Human Language Technologies: The 2010 Annual Conference of the
#' North American Chapter of the ACL*, Los Angeles, California, June 2010,
#' 325-328. https://aclanthology.org/N10-1046/
#'
#' Jurafsky, D. & Martin, J.H. (2018). From *Speech and Language Processing:
#' An Introduction to Natural Language Processing, Computational Linguistics,
Expand Down
29 changes: 14 additions & 15 deletions R/tokens_ngrams.R
Original file line number Diff line number Diff line change
@@ -1,18 +1,18 @@
#' Create ngrams and skipgrams from tokens
#' Create n-grams and skip-grams from tokens
#'
#' Create a set of ngrams (tokens in sequence) from already tokenized text
#' objects, with an optional skip argument to form skipgrams. Both the ngram
#' Create a set of n-grams (tokens in sequence) from already tokenized text
#' objects, with an optional skip argument to form skip-grams. Both the n-gram
#' length and the skip lengths take vectors of arguments to form multiple
#' lengths or skips in one pass. Implemented in C++ for efficiency.
#' @return a tokens object consisting a list of character vectors of ngrams, one
#' @return a tokens object consisting a list of character vectors of n-grams, one
#' list element per text, or a character vector if called on a simple
#' character vector
#' @param x a tokens object, or a character vector, or a list of characters
#' @param n integer vector specifying the number of elements to be concatenated
#' in each ngram. Each element of this vector will define a \eqn{n} in the
#' in each n-gram. Each element of this vector will define a \eqn{n} in the
#' \eqn{n}-gram(s) that are produced.
#' @param skip integer vector specifying the adjacency skip size for tokens
#' forming the ngrams, default is 0 for only immediately neighbouring words.
#' forming the n-grams, default is 0 for only immediately neighbouring words.
#' For `skipgrams`, `skip` can be a vector of integers, as the
#' "classic" approach to forming skip-grams is to set skip = \eqn{k} where
#' \eqn{k} is the distance for which \eqn{k} or fewer skips are used to
Expand All @@ -24,7 +24,7 @@
#' (underscore) character
#' @details Normally, these functions will be called through
#' `[tokens](x, ngrams = , ...)`, but these functions are provided
#' in case a user wants to perform lower-level ngram construction on tokenized
#' in case a user wants to perform lower-level n-gram construction on tokenized
#' texts.
#' @export
#' @examples
Expand Down Expand Up @@ -116,17 +116,16 @@ tokens_ngrams.tokens <- function(x, n = 2L, skip = 0L, concatenator = "_") {

#' @rdname tokens_ngrams
#' @details
#' [tokens_skipgrams()] is a wrapper to [tokens_ngrams()]
#' that requires arguments to be supplied for both `n` and `skip`.
#' For \eqn{k}-skip skipgrams, set `skip` to `0:`\eqn{k}, in order
#' to conform to the definition of skip-grams found in Guthrie et al (2006): A
#' \eqn{k} skip-gram is an ngram which is a superset of all ngrams and each
#' \eqn{(k-i)} skipgram until \eqn{(k-i)==0} (which includes 0 skip-grams).
#' [tokens_skipgrams()] is a wrapper to [tokens_ngrams()] that requires
#' arguments to be supplied for both `n` and `skip`. For \eqn{k}-skip
#' skip-grams, set `skip` to `0:`\eqn{k}, in order to conform to the
#' definition of skip-grams found in Guthrie et al (2006): A \eqn{k} skip-gram
#' is an n-gram which is a superset of all n-grams and each \eqn{(k-i)}
#' skip-gram until \eqn{(k-i)==0} (which includes 0 skip-grams).
#' @export
#' @references
#' Guthrie, David, Ben Allison, Wei Liu, Louise Guthrie, and Yorick Wilks. 2006.
#' "[A Closer
#' Look at Skip-Gram Modelling](https://aclanthology.org/L06-1210/)."
#' "A Closer Look at Skip-Gram Modelling." `https://aclanthology.org/L06-1210/`
#' @importFrom utils combn
#' @examples
#' # skipgrams
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ data](https://cdn.rawgit.com/quanteda/quanteda/master/images/quanteda_logo.svg)]

[![CRAN
Version](https://www.r-pkg.org/badges/version/quanteda)](https://CRAN.R-project.org/package=quanteda)
[![](https://img.shields.io/badge/devel%20version-3.2.2-royalblue.svg)](https://github.com/quanteda/quanteda)
[![](https://img.shields.io/badge/devel%20version-3.2.3-royalblue.svg)](https://github.com/quanteda/quanteda)
[![Downloads](https://cranlogs.r-pkg.org/badges/quanteda)](https://CRAN.R-project.org/package=quanteda)
[![Total
Downloads](https://cranlogs.r-pkg.org/badges/grand-total/quanteda?color=orange)](https://CRAN.R-project.org/package=quanteda)
Expand Down
3 changes: 3 additions & 0 deletions inst/WORDLIST
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
aclanthology
ACL
Biden
Biden's
Expand Down Expand Up @@ -161,6 +162,8 @@ naivebayes
nfeature
ngram
ngrams
n-gram
n-grams
nsentence
ntoken
nuls
Expand Down
2 changes: 1 addition & 1 deletion man/dfm_lookup.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 5 additions & 3 deletions man/fcm.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

28 changes: 14 additions & 14 deletions man/tokens_ngrams.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

0 comments on commit 3494a6f

Please sign in to comment.