Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset fixtures #115

Closed
wiederm opened this issue May 9, 2024 · 1 comment · Fixed by #118
Closed

Dataset fixtures #115

wiederm opened this issue May 9, 2024 · 1 comment · Fixed by #118
Assignees
Labels
refactoring Improve the quality of the code without functional changes

Comments

@wiederm
Copy link
Member

wiederm commented May 9, 2024

We need to clean up the fixtures defined in conftest.py; I propose the following structure:

We have a central place (maybe modelforge.dataset and modelforge.potential) in which we implement a dictionary for the implemented nnps and datasets, in which the name of the datset maps to the class of the dataset (this is already implemented).

We have fixtures with the following base structure:

@pytest.fixture
def dataset_factory():
    def create_dataset(dataset_name:str, for_unit_testing: bool = True, batch_size=64):
        return initialize_dataset(
            dataset_name, for_unit_testing=for_unit_testing, batch_size=batch_size
        )

    return create_dataset
    
    
def initialize_dataset(
    dataset_name:str,
    for_unit_testing: bool = True,
    batch_size: int = 64,
    splitting_strategy:SplittingStrategy = FirstComeFirstServeSplittingStrategy() 
) -> DataModule:
    """
    Initialize a dataset for a given mode.

    data_module = DataModule(
        dataset_name,
        splitting_strategy=splitting_strategy,
        split_file=split_file,
        batch_size=batch_size,
    )
    data_module.prepare_data()
    data_module.setup()
    return data_module

and for each test that should iterate over the datasets and, e.g., different batch sizes, we explicitly add @pytest.mark.parametrize

@wiederm
Copy link
Member Author

wiederm commented May 9, 2024

This should be done after we merge PR #112

@wiederm wiederm linked a pull request May 11, 2024 that will close this issue
3 tasks
@wiederm wiederm self-assigned this May 16, 2024
@wiederm wiederm added optimization Issues related to improving performance. refactoring Improve the quality of the code without functional changes and removed optimization Issues related to improving performance. labels May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
refactoring Improve the quality of the code without functional changes
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant