Generated data splits should be tracked along with model outputs, and not stored with data. #286

yalaudah · 2020-04-23T19:29:17Z

When running the prepare_dutchf3.py script, the results should be stored, and tracked, along with the outputs of each model run (logs, snapshots, configs, etc), and not separately. This means the code should run as part of the data prep in the training scripts, and not once. We should also make sure that all the required parameters (e.g. stride, or section_stride) are stored in the config files.

Otherwise, newer model runs might use older data splits, and there is no way to track which data split was used with a model.

Note: This also requires changes to the Docker implementation, and to the README file.

The text was updated successfully, but these errors were encountered:

yalaudah added the Type: Enhancement This an enhancement to an existing feature label Apr 23, 2020

yalaudah added this to the V0.4 [Datasets] milestone Apr 23, 2020

yalaudah added this to Mn: Backlog in Manganese May 28, 2020

maxkazmsft modified the milestones: V0.2 [June release], V0.2.1 [July release] Jun 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generated data splits should be tracked along with model outputs, and not stored with data. #286

Generated data splits should be tracked along with model outputs, and not stored with data. #286

yalaudah commented Apr 23, 2020 •

edited

Generated data splits should be tracked along with model outputs, and not stored with data. #286

Generated data splits should be tracked along with model outputs, and not stored with data. #286

Comments

yalaudah commented Apr 23, 2020 • edited

yalaudah commented Apr 23, 2020 •

edited