Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consistency between MapDataset.stack and Datasets.stack_reduce #5261

Open
registerrier opened this issue May 15, 2024 · 3 comments · May be fixed by #5270
Open

Consistency between MapDataset.stack and Datasets.stack_reduce #5261

registerrier opened this issue May 15, 2024 · 3 comments · May be fixed by #5270
Labels
Milestone

Comments

@registerrier
Copy link
Contributor

Currently Datasets.stack_reduce() will first apply the to_masked() method to the first dataset.
Using MapDataset.stack the mask is not applied to self, the assumption being that the first dataset used is empty.
While this is probably true in general, this might lead to some issues. At least the docstring should more clearly state this.
Note also that MapDatasetOnOff.stack applies the mask to self.counts_off, which is then inconsistent.

total_off.stack(

See associated PRs : #3058 #3164

Is there any reason for this specific behavior?

@AtreyeeS
Copy link
Member

If I remember correctly, the behaviour in MapDataset.stack is intentional. It was the cleanest way to be able to stack maps during the data reduction process while handling the safe mask correctly.
This is included in the Note here https://docs.gammapy.org/1.2/user-guide/datasets/index.html#stacking-multiple-datasets - "To properly handle masks, it is necessary to stack onto an empty dataset."
Datasets.stack_reduce was introduced to avoid users stacking onto non-empty maps. The idea was general users should only use Datasets.stack_reduce and not MapDataset.stack. I guess we did not expose these consistently.

Not sure what is happening in MapDatasetOnOff.

@adonath
Copy link
Member

adonath commented May 15, 2024

Using MapDataset.stack the mask is not applied to self, the assumption being that the first dataset used is empty.

If I remember correctly, the first implementation of MapDataset.stack() had the in-place application of the mask. This lead to a lot of complex code, because one needed to apply the mask to all quantities in the dataset. The same code was then repeated for the "other" dataset. It was much simpler to introduce the .to_masked() method, with all the mask application in a single place.

Applying the mask for in-place stacking would mean to always apply it, because one cannot know whether the stacking happens the first time. This means, except for the first time, one would un-necessarily apply the (stacked) mask over and over again.

But independently, stack_reduce() is the higher level method anyway. So I think there is no need for consistency. However the behavior of the on-off dataset should be fixed.

@registerrier
Copy link
Contributor Author

OK thanks. Not re-masking all the time seems a reasonable approach indeed. While solving #5245 , I try to adapt the docstring and the behaviour of MapDatasetOnOff.stack()

@registerrier registerrier added this to the 1.2.1 milestone May 16, 2024
@registerrier registerrier linked a pull request May 17, 2024 that will close this issue
@registerrier registerrier linked a pull request May 17, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants