Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

INFO: Resuming from iteration for provided data will fetch data until required iteration ... #3247

Open
H4dr1en opened this issue May 15, 2024 · 3 comments
Labels

Comments

@H4dr1en
Copy link
Contributor

H4dr1en commented May 15, 2024

❓ Questions/Help/Support

I observed in the training logs the message that I don't understand, could you please clarify what happens here and why?

INFO: Resuming from iteration for provided data will fetch data until required iteration ...

This happens for all validation engines I have, that I create as follows:

        for valid_dataset_name, valid_engine in valid_engines.items():
            valid_loader = valid_loaders[valid_dataset_name]
            train_engine.add_event_handler(Events.EPOCH_COMPLETED, partial(valid_engine.run, data=valid_loader))

Note: I use DeterministicEngine for all engines (training and validation)

@vfdev-5
Copy link
Collaborator

vfdev-5 commented May 15, 2024

The message means that deterministic engine is trying to resume the run from some non-zero iteration. For deterministic engines we have to rewind dataloader up to the resuming iteration otherwise randomness state wont be fully respected (probably here there can be more context: https://pytorch.org/ignite/engine.html#dataflow-synchronization).

Given the code you provide, I would say this is more like a bug.
Probably, valid_engine was stopped at some point without getting the full 1 epoch and then it was called to run again...

@H4dr1en
Copy link
Contributor Author

H4dr1en commented May 15, 2024

Given the code you provide, I would say this is more like a bug.

Do you mean a bug in ignite or in my code?

Probably, valid_engine was stopped at some point without getting the full 1 epoch and then it was called to run again...

I am not stopping any of the valid_engines, they all run for a single full epoch of validation after each training epoch

@vfdev-5
Copy link
Collaborator

vfdev-5 commented May 15, 2024

Do you mean a bug in ignite or in my code?

Difficult to say like that. Is it possible that you could provide more code to repro the issue ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants