Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overhaul PISM Configuration Parameters #202

Open
citibeth opened this issue Nov 12, 2013 · 3 comments
Open

Overhaul PISM Configuration Parameters #202

citibeth opened this issue Nov 12, 2013 · 3 comments
Labels

Comments

@citibeth
Copy link
Member

This task is related to three former issues. I have opened it in order to provide design input on past parameter systems I've used.

#175
#144
#190

The simplest parameter systems create an association between keys and
string values, and then allow programs to query it. This approach has
a number of shortcomings:

  1. Simple user errors (such as misspelling of parameter name) are not
    caught.
  2. There's no easy way to get documentation of what parameters are
    required for a particular model run. That is especially true if
    different model runs use different set of components, requiring
    different sets of parameters.
  3. There's no way to specify REQUIRED parameters vs. OPTIONAL parameters.
  4. There is no typechecking on parameters, leaving type checking and
    conversion to ad-hoc code by the model programmer.

To address these issues, a somewhat more upgrade parameter system is
recommended. Model components should be able to DECLARE what
parameters they use. Each declaration should include a parameter
name, type, documentation string, default value, and whether or not
this parameter is required. For components that may be instantiated
more than once, some kind of scoping or context will be needed so the
"same" parameter can be set differently for different instances of the
component.

When the system is initialized, all possible components will be called
to declare their parameters. This allows the parameter system to
provide the following useful services:

a) Print up-to-date documentation on ALL available parameters.
Filter that documentation by model component (for example, some
parameters may or may not be needed, depending on whether the
atmosphere component is used in this run).

b) Determine that a model component does not have the required
parameters it needs to run, and abort the simulation with an
appopriate error message before it starts.

c) Typecheck all parameter values provided by the user, and abort
with an error message if there are any problems.

d) Check for any EXTRA parameters that weren't expected. This could
be a sign that the user was trying to do something, but got it
wrong.

e) Identify parameters that the user is trying to set, but that were
never registered. Thus, a class of user error is eliminated.

f) Write out ALL relevant parameters at the beginning of a model
run, allowing for a concise description of WHAT the parameter
settings were for a particular run. Makes for an easy way to re-do
this model run in the future, by just loading back the auto-written
parameter file.

How does one set parameters? They can be set in a configuration file
or on the command line. There should be a standard way to set
parameters in both places, based on the name of the parameter -- the
relation between parameter names and command line arguments to set
them should NOT at all be arbitrary.

One should consider setting things up so parameters are set with a
Python script, instead of in some kind of configuration file for which
one must write a custom parser. This can save a lot of effort. It
also addresses issues when people want to start doing more complicated
things. For example:

TOP_HEIGHT = 5
BOTTOM_HEIGHT = 3
NEXT_LAYER = TOP_HEIGHT + 4

Once one has integrated Python this far, it makes sense to ask whether
the C++ main() should be eliminated altogether. One can bind the
top-level components (ice, atmosphere, ocean, etc) into Python and
then replace pismr.cc with pismr.py. However, one advantage of a
simple KEY=VALUE kind of paramter file is it is NOT Turing complete.
Once you make your parameter language Turing complete, it becomes
harder to automatically process parameter files (which would then just
be general Python scripts). A good balance needs to be found here.

@ckhroulev
Copy link
Member

Bob,

Thanks for your input!

I have been thinking about this for quite some time and I am, in fact, working on an "upgrade" that will resolve all the issues you mention.

Regarding replacing main() with Python code: I have considered using Python for non-computationally-intensive parts of PISM. I don't think we are ready for this.

Here's why. Yes, scripting language wrappers for stable libraries tend to work well and do save time. PISM is neither stable nor is it a library, although it is getting there gradually. Wrapping code whose API is in flux makes both debugging and maintenance harder. This is obviously not what we want.

PS: PISM's inversion modeling tools use a fairly small and rather stable part of PISM, so in this case using Python wrappers seems to pay off.

@ckhroulev
Copy link
Member

A note to myself: see the bb4e9cb commit message.

ckhroulev added a commit that referenced this issue Feb 13, 2014
See bb4e9cb.

Command-line options should map directly to configuration parameters,
but some parameters have shortened command-line options. All these
shorter options are documented in the manual. All the options *not*
documented in the manual match corresponding configuration
parameters *exactly*.

We still have ~25 undocumented options. They will be takes care of once
I get to #202.
ckhroulev added a commit that referenced this issue Feb 13, 2014
See bb4e9cb.

Command-line options should map directly to configuration parameters,
but some parameters have shortened command-line options. All these
shorter options are documented in the manual. All the options *not*
documented in the manual match corresponding configuration
parameters *exactly*.

We still have ~25 undocumented options. They will be takes care of once
I get to #202.
@ckhroulev
Copy link
Member

See #248 (comment)

ckhroulev referenced this issue May 7, 2014
…t CF compliance.

This minor problem arises because pism_config is now included in output files.

The CF convention document excludes hyphens:  "Variable, dimension and attribute
names should begin with a letter and be composed of letters, digits, and
underscores. Note that this is in conformance with the COARDS conventions,
but is more restrictive than the netCDF interface which allows use of the hyphen
character."
ckhroulev added a commit that referenced this issue May 21, 2014
This commit moves some code from a private repository of mine into PISM.

This code was *not* added to the build system yet.

The ConfigJSON class requires the Jansson
library (https://github.com/akheron/jansson). I will add Jansson to the
PISM source code tree as a Git submodule. (I think we should also fork
it under pism/jansson on GitHub.) Luckily Jansson uses a license that
allows such use and does not conflict with GPL. It also uses CMake as a
build system, so we can integrate it into PISM with very little effort.

The JSON configuration file validator (validate_config.py) requires the
jsonschema Python module (https://pypi.python.org/pypi/jsonschema).

All of this is "alpha-quality" code and change a lot. In particular, it
will use upcoming changes in the PISM error handling code.

The pism_config.json is (approximately) what a PISM's config file will
look like. It is a bit out of date (does not include some parameters
that were added recently) and does not include command-line options
yet. *But* we will be able to automatically generate documentation for
all flags, parameters, and command-line options, so such documentation
will always be up to date.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants