Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Analysis class cleanup to support future extension of the hli #3852

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

QRemy
Copy link
Contributor

@QRemy QRemy commented Mar 15, 2022

Add the remaining changes proposed in #3788 related to the cleanup of the analysis class :

  • the methods .get_xx are replace by specific AnalysisStep classes and can be defined in the configuration file (for example config.general.steps = ["data-reduction", "fit"]), or called interactively by analysis.run(["data-reduction", "fit"])
  • in the cli gammapy analysis run is now equivalent to analysis.run() so it can execute all the analysis step available and not only data reduction.

@codecov
Copy link

codecov bot commented Mar 15, 2022

Codecov Report

Merging #3852 (dadb9ef) into master (50b3297) will increase coverage by 0.00%.
The diff coverage is 94.65%.

@@           Coverage Diff           @@
##           master    #3852   +/-   ##
=======================================
  Coverage   93.78%   93.78%           
=======================================
  Files         162      163    +1     
  Lines       20138    20222   +84     
=======================================
+ Hits        18886    18965   +79     
- Misses       1252     1257    +5     
Impacted Files Coverage Δ
gammapy/analysis/steps.py 94.05% <94.05%> (ø)
gammapy/analysis/core.py 98.19% <97.29%> (+2.57%) ⬆️
gammapy/analysis/__init__.py 100.00% <100.00%> (ø)
gammapy/analysis/config.py 100.00% <100.00%> (ø)
gammapy/scripts/analysis.py 100.00% <100.00%> (ø)

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

@adonath adonath self-assigned this Mar 18, 2022
@adonath adonath added this to the 1.0 milestone Mar 18, 2022
Copy link
Member

@adonath adonath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @QRemy, I think this goes exactly in the right direction! Especially I like the improvement in code organization and possibility to extend the Analysis pipeline with a registry.

I have left a few general comments with proposals for the API and some remaining questions for now. I think it might be good to discuss this in a bit more detail again, maybe in a dedicated meeting next week?

requires_datasets = False
requires_models = False

def __init__(self, analysis, name=None, overwrite=True):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a first look taking the analysis class on __init__ does not make sense to me. This introduces a "cross dependency", while in fact I think it can be hierarchical in the sense that the Analysis class is built from AnalysisStep classes. It think this can be resolved by slightly refactoring the API of the AnalysisStep class, along the lines of:

class AnalysisStep:
    """Analysis step class"""
    tag = "analysis-step"
    def __init__(self, analysis_sub_config, overwrite=True, log=None):
        self.config = analysis_sub_config
        self.overwrite = overwrite
        
        if log is None:
            log = logging.getLogger(__name__)
        
        self.log = log
    
    @property
    def maker_config(self):
        # translate analysis sub config to Gammapy API config here
        return config
    
    def run(self, datasets, models=None):
        maker = Maker(**self.maker_config)
        # returning might be optional...changes could happen in place
        return datasets
    
    

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a minima, you could take the AnalysisConfig on init and pass the Analysis to run().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for now changed to take AnalysisConfig on init and pass the Analysis to run() but in the next PR I will introduce a specific AnalysisProducts container to return outputs and pass data references on run().

def __init__(self, analysis, name=None, overwrite=True):
self.analysis = analysis
self.overwrite = overwrite
self._name = make_name(name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the name attribute used for? Is to generate the dataset later? But what happened for multiple datasets?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for now it is not used but I had in mind that it would be use to select data product from specific steps

gammapy/analysis/steps.py Show resolved Hide resolved
gammapy/analysis/steps.py Show resolved Hide resolved
gammapy/analysis/steps.py Show resolved Hide resolved
self.analysis.datasets = Datasets([stacked])


def make_energy_axis(axis, name="energy"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't thought this through, but we could maybe even introduce and API like, MapAxis.from_analysis_config(config=) and Maker.from_analysis_config()...

self.analysis.datasets = Datasets([stacked])


def make_energy_axis(axis, name="energy"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't thought this through, but we could maybe even introduce and API like, MapAxis.from_analysis_config(config=) and Maker.from_analysis_config()...

Copy link
Contributor

@registerrier registerrier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @QRemy. This is a very ambitious change but it is definitely very interesting.

See some inline comments.
To make progress possible and have an intermediate working solution, it might be possible to pass the Analysis object to each step rather than passing it on init.
Would that be OK @adonath ?

gammapy/analysis/config.py Show resolved Hide resolved
gammapy/analysis/core.py Outdated Show resolved Hide resolved
path = make_path(obs_settings.obs_file)
ids = list(Table.read(path, format="ascii", data_start=0).columns[0])
selected_obs_table = self.datastore.obs_table.select_obs_id(ids)
def run(self, steps=None, overwrite=None, **kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add doctring

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does the overwrite option do?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had in mind that each AnalysisStep could have a read/write method but for now this is used only in the DataReductionAnalysisStep to read the datasets if exits.

ids = list(Table.read(path, format="ascii", data_start=0).columns[0])
selected_obs_table = self.datastore.obs_table.select_obs_id(ids)
def run(self, steps=None, overwrite=None, **kwargs):
if steps is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe steps creation should be done in another method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It means they will have to be kept in memory and attached to the analysis class which affects where the information is stored. I will try this later after sorting out the input/output of the analysis steps.

def run_fit(self):
"""Fitting reduced datasets to model."""
if not self.models:
def check_datasets(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is dataset reading and/or model creation and setting a specific analysis step?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no but it could be, for now reading of config.general.datasets_file is done through the data-selection step if the file exists, and if overwrite is False.

requires_datasets = False
requires_models = False

def __init__(self, analysis, name=None, overwrite=True):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a minima, you could take the AnalysisConfig on init and pass the Analysis to run().

gammapy/analysis/steps.py Outdated Show resolved Hide resolved
gammapy/analysis/steps.py Outdated Show resolved Hide resolved
gammapy/analysis/steps.py Show resolved Hide resolved
)
]

self.analysis.check_datasets()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is it needed explicitly here?

Copy link
Contributor Author

@QRemy QRemy Apr 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed it for now but in future PR I will reintroduce a similar system to check if the data required for the each step are well defined

@adonath adonath modified the milestones: 1.0rc, 1.0 May 5, 2022
@registerrier registerrier modified the milestones: 1.0, 1.1 Sep 21, 2022
@registerrier
Copy link
Contributor

This PR implements the basis of a large refactoring of the HLI. A number of design choices will have to be made and a full expanded and improved HLI is an objective for v2.0.
For now, even though the changes proposed in this PR do not break the API, we decided during the co-working week to postpone to v1.1 and implement the HLI in parallel with the existing one.

@QRemy QRemy modified the milestones: 1.2, 2.0 Aug 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants