Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

training on large data set is failing #120

Open
lihka1 opened this issue Feb 4, 2022 · 3 comments
Open

training on large data set is failing #120

lihka1 opened this issue Feb 4, 2022 · 3 comments

Comments

@lihka1
Copy link

lihka1 commented Feb 4, 2022

when i am trying to train on large data set text (around 500MB file) its failing with error

[info] loading text
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
Aborted

Any ideas please?

@bakwc
Copy link
Owner

bakwc commented Feb 4, 2022

Try to use machine with more amount of memory. Or may be a large swap file (can take a really long time).
Or you can purchase a "Pro" version - there is a memory optimization that reduce memory consumption while training.

@lihka1
Copy link
Author

lihka1 commented Feb 4, 2022

thanks.

Also one more question,
image
i trained on a smaller data.

when i tried to fix for a sample sentence like '"Otb.r Compreh�nsiYe Incone" model outputs "Otbr ComprehensIve Income"

is there any way to get Otb.r corrected as Other. I can see in the training data there are many bigrams, tigrams of other comprehensive income

@mirfan899
Copy link

@lihka1 it depends on the alphabet you used for training the model. Here you have . but in alphabet its not there so it will not correct this word.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants