Overhaul PISM Configuration Parameters #202

citibeth · 2013-11-12T17:13:22Z

This task is related to three former issues. I have opened it in order to provide design input on past parameter systems I've used.

#175
#144
#190

The simplest parameter systems create an association between keys and
string values, and then allow programs to query it. This approach has
a number of shortcomings:

Simple user errors (such as misspelling of parameter name) are not
caught.
There's no easy way to get documentation of what parameters are
required for a particular model run. That is especially true if
different model runs use different set of components, requiring
different sets of parameters.
There's no way to specify REQUIRED parameters vs. OPTIONAL parameters.
There is no typechecking on parameters, leaving type checking and
conversion to ad-hoc code by the model programmer.

To address these issues, a somewhat more upgrade parameter system is
recommended. Model components should be able to DECLARE what
parameters they use. Each declaration should include a parameter
name, type, documentation string, default value, and whether or not
this parameter is required. For components that may be instantiated
more than once, some kind of scoping or context will be needed so the
"same" parameter can be set differently for different instances of the
component.

When the system is initialized, all possible components will be called
to declare their parameters. This allows the parameter system to
provide the following useful services:

a) Print up-to-date documentation on ALL available parameters.
Filter that documentation by model component (for example, some
parameters may or may not be needed, depending on whether the
atmosphere component is used in this run).

b) Determine that a model component does not have the required
parameters it needs to run, and abort the simulation with an
appopriate error message before it starts.

c) Typecheck all parameter values provided by the user, and abort
with an error message if there are any problems.

d) Check for any EXTRA parameters that weren't expected. This could
be a sign that the user was trying to do something, but got it
wrong.

e) Identify parameters that the user is trying to set, but that were
never registered. Thus, a class of user error is eliminated.

f) Write out ALL relevant parameters at the beginning of a model
run, allowing for a concise description of WHAT the parameter
settings were for a particular run. Makes for an easy way to re-do
this model run in the future, by just loading back the auto-written
parameter file.

How does one set parameters? They can be set in a configuration file
or on the command line. There should be a standard way to set
parameters in both places, based on the name of the parameter -- the
relation between parameter names and command line arguments to set
them should NOT at all be arbitrary.

One should consider setting things up so parameters are set with a
Python script, instead of in some kind of configuration file for which
one must write a custom parser. This can save a lot of effort. It
also addresses issues when people want to start doing more complicated
things. For example:

TOP_HEIGHT = 5
BOTTOM_HEIGHT = 3
NEXT_LAYER = TOP_HEIGHT + 4

Once one has integrated Python this far, it makes sense to ask whether
the C++ main() should be eliminated altogether. One can bind the
top-level components (ice, atmosphere, ocean, etc) into Python and
then replace pismr.cc with pismr.py. However, one advantage of a
simple KEY=VALUE kind of paramter file is it is NOT Turing complete.
Once you make your parameter language Turing complete, it becomes
harder to automatically process parameter files (which would then just
be general Python scripts). A good balance needs to be found here.

ckhroulev · 2013-11-12T18:09:00Z

Bob,

Thanks for your input!

I have been thinking about this for quite some time and I am, in fact, working on an "upgrade" that will resolve all the issues you mention.

Regarding replacing main() with Python code: I have considered using Python for non-computationally-intensive parts of PISM. I don't think we are ready for this.

Here's why. Yes, scripting language wrappers for stable libraries tend to work well and do save time. PISM is neither stable nor is it a library, although it is getting there gradually. Wrapping code whose API is in flux makes both debugging and maintenance harder. This is obviously not what we want.

PS: PISM's inversion modeling tools use a fairly small and rather stable part of PISM, so in this case using Python wrappers seems to pay off.

ckhroulev · 2014-02-11T03:33:46Z

A note to myself: see the bb4e9cb commit message.

See bb4e9cb. Command-line options should map directly to configuration parameters, but some parameters have shortened command-line options. All these shorter options are documented in the manual. All the options *not* documented in the manual match corresponding configuration parameters *exactly*. We still have ~25 undocumented options. They will be takes care of once I get to #202.

ckhroulev · 2014-03-06T15:22:46Z

See #248 (comment)

…t CF compliance. This minor problem arises because pism_config is now included in output files. The CF convention document excludes hyphens: "Variable, dimension and attribute names should begin with a letter and be composed of letters, digits, and underscores. Note that this is in conformance with the COARDS conventions, but is more restrictive than the netCDF interface which allows use of the hyphen character."

This commit moves some code from a private repository of mine into PISM. This code was *not* added to the build system yet. The ConfigJSON class requires the Jansson library (https://github.com/akheron/jansson). I will add Jansson to the PISM source code tree as a Git submodule. (I think we should also fork it under pism/jansson on GitHub.) Luckily Jansson uses a license that allows such use and does not conflict with GPL. It also uses CMake as a build system, so we can integrate it into PISM with very little effort. The JSON configuration file validator (validate_config.py) requires the jsonschema Python module (https://pypi.python.org/pypi/jsonschema). All of this is "alpha-quality" code and change a lot. In particular, it will use upcoming changes in the PISM error handling code. The pism_config.json is (approximately) what a PISM's config file will look like. It is a bit out of date (does not include some parameters that were added recently) and does not include command-line options yet. *But* we will be able to automatically generate documentation for all flags, parameters, and command-line options, so such documentation will always be up to date.

This was referenced Jan 29, 2014

Clean up command-line options #144

Closed

NCConfigVariables --> PISMConfig #234

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Overhaul PISM Configuration Parameters #202

Overhaul PISM Configuration Parameters #202

citibeth commented Nov 12, 2013

ckhroulev commented Nov 12, 2013

ckhroulev commented Feb 11, 2014

ckhroulev commented Mar 6, 2014

Overhaul PISM Configuration Parameters #202

Overhaul PISM Configuration Parameters #202

Comments

citibeth commented Nov 12, 2013

ckhroulev commented Nov 12, 2013

ckhroulev commented Feb 11, 2014

ckhroulev commented Mar 6, 2014