Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sigma v estimator functionality for dark matter use case #2075

Open
wants to merge 46 commits into
base: main
Choose a base branch
from

Conversation

Bultako
Copy link
Member

@Bultako Bultako commented Mar 7, 2019

This PR adds a new class SigmaVEstimator in astro.darkmatter.utils as a first attempt to address a general use case in the dark matter community. See a preliminary notebook using SigmaVEstimator here.

It still needs to be improved, mainly with a clear method on how to estimate upper limits when fitting extremely low signals compared to the background model. Once this point will be solved, tests could be added accordingly.

For the moment I leave it as an open/work in progress PR, in case it needs discussion at this stage.

@Bultako Bultako self-assigned this Mar 7, 2019
@Bultako Bultako added the feature label Mar 7, 2019
@adonath adonath self-requested a review March 8, 2019 08:21
@adonath adonath modified the milestones: 0.11, 0.12 Mar 8, 2019
Copy link
Member

@adonath adonath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Bultako! This is a very nice contribution and feature for the dark-matter community. I've left a few in-line comments.

Another comment I have is, whether you could split the .run() method, because currently it's very long. At least introduce a method that compute the likelihood profiles and stores in a Table and a second method that starts from the table and compute the best fit values and upper limits form the likelihood profile. This way the the code becomes easier to debug and reuse e.g. when the brentq fails. The .run() method could stay and just call both of the steps one after another. Again you might want to take a look how the FluxPointEstimator is implemented.

jfact = 3.41e19 * u.Unit("GeV2 cm-5")
channels = ["b", "t", "Z"]
masses = [70, 200, 500, 5000, 10000, 50000, 100000]*u.GeV
estimator = SigmaVEstimator(simulated_dataset, masses, channels, jfact=JFAC)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this example does not work, as the simulated_dataset is missing...I guess now that #2064 is merged it can be added?

xsection = DMAnnihilation.THERMAL_RELIC_CROSS_SECTION
self.xsection = xsection

self._spatial_model = dataset.model.spatial_model
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really useful to introduce these "private" variables? Why not create and use local variables later?

self._psf = dataset.psf
self._edisp = dataset.edisp

self._counts_map = WcsNDMap(self._geom, np.random.poisson(dataset.npred().data))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the simulated dataset already contains a simulated counts map, so maybe remove?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had imagined calling this SigmaVEstimator.run() inside an external loop of n runs (see cell[13] in notebook https://github.com/peroju/dm-gammapy/blob/master/notebooks/DarkMatterUseCaseUpgraded.ipynb and the TODO comment in the code cell) Every call in the loop should create a different random poisson counts map from our simulated dataset, and the results of each individual fitting of the run would be averaged at the end.

I had thought of providing a different tool later (if needed) building this loop of n runs, using SigmaVEstimator.run() internally in the loop and throwing the averaged results.

Does this make sense?

Dict with results as `~astropy.table.Table` objects for each channel.
"""

result = {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to use a flattened data structure here? I.e. use an astropy.table.Table with a mass and channel column? For every combination of channel and mass one row is added to the table. I think this simplifies the code as well as the data structure that is returned.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a certain hierarchy to be kept in the results (at least from the user-access point of view) where every channel has a list of (mass, sigma-v, err, etc.) values. This is the main reason why tables are linked to dictionary items (channels).

)
try:
fit = Fit(dataset_loop)
fit.datasets.parameters.apply_autoscale = False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you use the fit.likelihood_profile() method now, it's not necessary to switch off the autoscaling. Maybe just remove...

fit_result = fit.run(optimize_opts, covariance_opts)
likemin = fit_result.total_stat
profile = fit.likelihood_profile(
parameter="scale", bounds=5, nvalues=50
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would definitely make the bounds and nvalues configurable. Maybe take a look at what we did for the FluxPointEstimator.

parameter="scale", bounds=5, nvalues=50
)
xvals = profile["values"]
yvals = profile["likelihood"] - likemin - 2.71
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again I would introduce a ts_diff variable to make it configurable (with using 2.71 as a default value). If you don't want to make it configurable at all please add as global variable TS_DIFF and don't hard-code the number in the method.

scale_max = max(xvals)

scale_found = brentq(
interp1d(xvals, yvals, kind="cubic"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think cubic interpolation is probably not suitable for likelihood profile. I presume quadratic should be more stable (not sure whether it makes a difference though...). There is also a helper function _interp_likelihood_profile, that you could use.

@Bultako Bultako force-pushed the sigma_estimator branch 2 times, most recently from 208e452 to a7fae18 Compare March 26, 2019 19:11
@Bultako
Copy link
Member Author

Bultako commented Mar 27, 2019

Thanks @adonath for the review.
I've gone through all your suggestions, I've done most of them and exposed reasons in the review why not doing the others :) You can re-review if you feel like it. Anyway I'd suggest to keep this PR still open while I check it all with users.

@Bultako Bultako force-pushed the sigma_estimator branch 3 times, most recently from 19e0ec1 to b68f711 Compare April 1, 2019 16:16
@adonath adonath modified the milestones: 0.12, 0.13 May 28, 2019
@adonath adonath modified the milestones: 0.13, 0.14 Jul 22, 2019
@cdeil cdeil modified the milestones: 0.14, 0.15 Sep 2, 2019
@Bultako Bultako force-pushed the sigma_estimator branch 6 times, most recently from 683b972 to 75d63f9 Compare September 18, 2019 18:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
gammapy.astro
  
In progress
Development

Successfully merging this pull request may close these issues.

None yet

3 participants