-
Notifications
You must be signed in to change notification settings - Fork 186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Analysis class cleanup to support future extension of the hli #3852
base: main
Are you sure you want to change the base?
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3852 +/- ##
=======================================
Coverage 93.78% 93.78%
=======================================
Files 162 163 +1
Lines 20138 20222 +84
=======================================
+ Hits 18886 18965 +79
- Misses 1252 1257 +5
📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @QRemy, I think this goes exactly in the right direction! Especially I like the improvement in code organization and possibility to extend the Analysis
pipeline with a registry.
I have left a few general comments with proposals for the API and some remaining questions for now. I think it might be good to discuss this in a bit more detail again, maybe in a dedicated meeting next week?
gammapy/analysis/steps.py
Outdated
requires_datasets = False | ||
requires_models = False | ||
|
||
def __init__(self, analysis, name=None, overwrite=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On a first look taking the analysis class on __init__
does not make sense to me. This introduces a "cross dependency", while in fact I think it can be hierarchical in the sense that the Analysis
class is built from AnalysisStep
classes. It think this can be resolved by slightly refactoring the API of the AnalysisStep
class, along the lines of:
class AnalysisStep:
"""Analysis step class"""
tag = "analysis-step"
def __init__(self, analysis_sub_config, overwrite=True, log=None):
self.config = analysis_sub_config
self.overwrite = overwrite
if log is None:
log = logging.getLogger(__name__)
self.log = log
@property
def maker_config(self):
# translate analysis sub config to Gammapy API config here
return config
def run(self, datasets, models=None):
maker = Maker(**self.maker_config)
# returning might be optional...changes could happen in place
return datasets
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a minima, you could take the AnalysisConfig
on init and pass the Analysis
to run().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for now changed to take AnalysisConfig
on init and pass the Analysis
to run() but in the next PR I will introduce a specific AnalysisProducts
container to return outputs and pass data references on run().
def __init__(self, analysis, name=None, overwrite=True): | ||
self.analysis = analysis | ||
self.overwrite = overwrite | ||
self._name = make_name(name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the name attribute used for? Is to generate the dataset later? But what happened for multiple datasets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for now it is not used but I had in mind that it would be use to select data product from specific steps
self.analysis.datasets = Datasets([stacked]) | ||
|
||
|
||
def make_energy_axis(axis, name="energy"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't thought this through, but we could maybe even introduce and API like, MapAxis.from_analysis_config(config=)
and Maker.from_analysis_config()
...
self.analysis.datasets = Datasets([stacked]) | ||
|
||
|
||
def make_energy_axis(axis, name="energy"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't thought this through, but we could maybe even introduce and API like, MapAxis.from_analysis_config(config=)
and Maker.from_analysis_config()
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @QRemy. This is a very ambitious change but it is definitely very interesting.
See some inline comments.
To make progress possible and have an intermediate working solution, it might be possible to pass the Analysis
object to each step rather than passing it on init.
Would that be OK @adonath ?
path = make_path(obs_settings.obs_file) | ||
ids = list(Table.read(path, format="ascii", data_start=0).columns[0]) | ||
selected_obs_table = self.datastore.obs_table.select_obs_id(ids) | ||
def run(self, steps=None, overwrite=None, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add doctring
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does the overwrite
option do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had in mind that each AnalysisStep could have a read/write method but for now this is used only in the DataReductionAnalysisStep to read the datasets if exits.
ids = list(Table.read(path, format="ascii", data_start=0).columns[0]) | ||
selected_obs_table = self.datastore.obs_table.select_obs_id(ids) | ||
def run(self, steps=None, overwrite=None, **kwargs): | ||
if steps is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe steps creation should be done in another method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It means they will have to be kept in memory and attached to the analysis class which affects where the information is stored. I will try this later after sorting out the input/output of the analysis steps.
gammapy/analysis/core.py
Outdated
def run_fit(self): | ||
"""Fitting reduced datasets to model.""" | ||
if not self.models: | ||
def check_datasets(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is dataset reading and/or model creation and setting a specific analysis step?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no but it could be, for now reading of config.general.datasets_file is done through the data-selection step if the file exists, and if overwrite is False.
gammapy/analysis/steps.py
Outdated
requires_datasets = False | ||
requires_models = False | ||
|
||
def __init__(self, analysis, name=None, overwrite=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a minima, you could take the AnalysisConfig
on init and pass the Analysis
to run().
gammapy/analysis/steps.py
Outdated
) | ||
] | ||
|
||
self.analysis.check_datasets() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is it needed explicitly here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed it for now but in future PR I will reintroduce a similar system to check if the data required for the each step are well defined
This PR implements the basis of a large refactoring of the HLI. A number of design choices will have to be made and a full expanded and improved HLI is an objective for v2.0. |
Add the remaining changes proposed in #3788 related to the cleanup of the analysis class :