Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors interrupting the text extraction process. #164

Open
bes827 opened this issue Jul 23, 2020 · 1 comment
Open

Errors interrupting the text extraction process. #164

bes827 opened this issue Jul 23, 2020 · 1 comment

Comments

@bes827
Copy link

bes827 commented Jul 23, 2020

I am now trying to extract a large number of word files (1500) placed in one folder, using readtext (after creating a list using list.files)

I am getting errors with some files (examples below), the problem is when this error occurs, the extraction process is stopped. I can identify the problematic file, by changing verbosity = 3, but then I have to restart the extraction process (to find another problematic file(s)).

My question is if there is a way to avoid interrupting the process if an error is encountered?

I change ignore_missing_files = TRUE but this did not fix the problem.

examples for the errors encountered:

write error in extracting from zip file
Error: 'C:\Users--- c/word/document.xml' does not exist.

@michalovadek
Copy link

I second the general idea of readtext coming with some error catching mechanism, because it can waste hours reading in a big batch of files only to then fail at some point with nothing to show for it.

A typical issue for me is an .rtf file saved as .doc by the creator which antiword cannot process and thus exits with an error; in this particular case it would be nice if readtext automatically tried the rtf reader when antiword fails (and guesses it's actually an rtf file).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants