Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[kwic] Error: cannot allocate vector of size 2.0 Gb #2171

Open
dnguyen-td opened this issue Apr 14, 2022 · 1 comment
Open

[kwic] Error: cannot allocate vector of size 2.0 Gb #2171

dnguyen-td opened this issue Apr 14, 2022 · 1 comment

Comments

@dnguyen-td
Copy link

Describe the bug

I used the kwic function to find keywords in context. My object size is 429MB. R popped up an error "Error: cannot allocate vector of size 2.0 Gb". I don't know how to fix this error.

Reproducible code

Please paste minimal code that reproduces the bug. If possible, please upload the data file as .rds.

x <- kwic(tokens(corpus(all_item1_2010)),
          pattern = keywords,
          valuetype = "regex",
          window =15)

Expected behavior

I have done with another object size of 446MB and the above-mentioned code work well, even though the size is bigger.

## System information

Please run sessionInfo() and paste the output.

R version 4.1.2 (2021-11-01)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252   
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] writexl_1.4.0           WriteXLS_6.3.0          quanteda.textplots_0.94 stringi_1.7.6          
 [5] lubridate_1.8.0         stringr_1.4.0           readtext_0.81           quanteda_3.2.0         
 [9] ggplot2_3.3.5           tidyr_1.2.0             dplyr_1.0.8             readr_2.1.2            

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.8         pillar_1.7.0       compiler_4.1.2     tools_4.1.2        stopwords_2.3     
 [6] lifecycle_1.0.1    tibble_3.1.6       gtable_0.3.0       lattice_0.20-45    pkgconfig_2.0.3   
[11] rlang_1.0.1        Matrix_1.3-4       fastmatch_1.1-3    rstudioapi_0.13    DBI_1.1.2         
[16] cli_3.1.1          xfun_0.29          httr_1.4.2         withr_2.4.3        generics_0.1.2    
[21] vctrs_0.3.8        hms_1.1.1          grid_4.1.2         tidyselect_1.1.1   data.table_1.14.2 
[26] glue_1.6.1         R6_2.5.1           fansi_1.0.2        tzdb_0.2.0         purrr_0.3.4       
[31] magrittr_2.0.2     scales_1.1.1       ellipsis_0.3.2     assertthat_0.2.1   colorspace_2.0-2  
[36] utf8_1.2.2         tinytex_0.37       RcppParallel_5.1.5 munsell_0.5.0      crayon_1.5.0```

## Additional info

Please add any other information about the issue.
@koheiw
Copy link
Collaborator

koheiw commented Apr 14, 2022

Cannot say anything precisely without knowing what are the corpus and keywords, but you should check if some of your regex are causing too many matches. Try index() to check this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants