training on large data set is failing #120

lihka1 · 2022-02-04T22:46:59Z

when i am trying to train on large data set text (around 500MB file) its failing with error

[info] loading text
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
Aborted

Any ideas please?

bakwc · 2022-02-04T22:57:52Z

Try to use machine with more amount of memory. Or may be a large swap file (can take a really long time).
Or you can purchase a "Pro" version - there is a memory optimization that reduce memory consumption while training.

lihka1 · 2022-02-04T23:03:16Z

thanks.

Also one more question,

i trained on a smaller data.

when i tried to fix for a sample sentence like '"Otb.r Compreh�nsiYe Incone" model outputs "Otbr ComprehensIve Income"

is there any way to get Otb.r corrected as Other. I can see in the training data there are many bigrams, tigrams of other comprehensive income

mirfan899 · 2022-09-05T10:30:02Z

@lihka1 it depends on the alphabet you used for training the model. Here you have . but in alphabet its not there so it will not correct this word.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

training on large data set is failing #120

training on large data set is failing #120

lihka1 commented Feb 4, 2022

bakwc commented Feb 4, 2022

lihka1 commented Feb 4, 2022 •

edited

mirfan899 commented Sep 5, 2022

training on large data set is failing #120

training on large data set is failing #120

Comments

lihka1 commented Feb 4, 2022

bakwc commented Feb 4, 2022

lihka1 commented Feb 4, 2022 • edited

mirfan899 commented Sep 5, 2022

lihka1 commented Feb 4, 2022 •

edited