Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass ... to base::print #2381

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Pass ... to base::print #2381

wants to merge 1 commit into from

Conversation

koheiw
Copy link
Collaborator

@koheiw koheiw commented Apr 23, 2024

Pass print.tokens(...) to base::print() to hide quotes around tokens. I think it is easier to read without quotes.

require(quanteda)
#> Loading required package: quanteda
#> Package version: 4.0.2
#> Unicode version: 15.1
#> ICU version: 74.1
#> Parallel computing: 8 of 8 threads used.
#> See https://quanteda.io for tutorials and examples.
toks <- tokens(data_corpus_inaugural[1:3])
print(toks)
#> Tokens consisting of 3 documents and 4 docvars.
#> 1789-Washington :
#>  [1] "Fellow-Citizens" "of"              "the"             "Senate"         
#>  [5] "and"             "of"              "the"             "House"          
#>  [9] "of"              "Representatives" ":"               "Among"          
#> [ ... and 1,525 more ]
#> 
#> 1793-Washington :
#>  [1] "Fellow"   "citizens" ","        "I"        "am"       "again"   
#>  [7] "called"   "upon"     "by"       "the"      "voice"    "of"      
#> [ ... and 135 more ]
#> 
#> 1797-Adams :
#>  [1] "When"      "it"        "was"       "first"     "perceived" ","        
#>  [7] "in"        "early"     "times"     ","         "that"      "no"       
#> [ ... and 2,565 more ]
print(toks, quote = FALSE)
#> Tokens consisting of 3 documents and 4 docvars.
#> 1789-Washington :
#>  [1] Fellow-Citizens of              the             Senate         
#>  [5] and             of              the             House          
#>  [9] of              Representatives :               Among          
#> [ ... and 1,525 more ]
#> 
#> 1793-Washington :
#>  [1] Fellow   citizens ,        I        am       again    called   upon    
#>  [9] by       the      voice    of      
#> [ ... and 135 more ]
#> 
#> 1797-Adams :
#>  [1] When      it        was       first     perceived ,         in       
#>  [8] early     times     ,         that      no       
#> [ ... and 2,565 more ]

Ideally, there is a option to print (or coerce) like this. But I don't know what is the best way.

lapply(toks, function(x) paste0(head(x, 12), collapse = " "))
#> $`1789-Washington`
#> [1] "Fellow-Citizens of the Senate and of the House of Representatives : Among"
#> 
#> $`1793-Washington`
#> [1] "Fellow citizens , I am again called upon by the voice of"
#> 
#> $`1797-Adams`
#> [1] "When it was first perceived , in early times , that no"

@koheiw koheiw requested a review from kbenoit April 23, 2024 04:35
Copy link
Collaborator

@kbenoit kbenoit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah... that's a somewhat radical change. Makes it especially hard to distinguish tokens that contain spaces. What if we added this as an option, with the default to keep the quotes?

@koheiw
Copy link
Collaborator Author

koheiw commented Apr 23, 2024

I agree that tokens should be quoted by default but it is nice to think how to make it easier to read. For example,

#>  [1] "Fellow-Citizens" "of" "the" "Senate"         
#>  [5] "and" "of" "the" "House"          
#>  [9] "of" "Representatives" ":" "Among" 

Not urgent, though...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants