Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add solution downsampling feature #537

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

aymkhalil
Copy link
Contributor

This PR enables downsampling a solution to save on write time & disk space should the solution be written out to files. It uses local averaging over user defined averaging factors along each axis:

  1. In order to enable dowsampling, simply pass a downsampling factors tuple to the controller. For example, in 3D:
 claw = pyclaw.Controller()
 claw.downsampling_factors = (2, 2, 2)
  1. The downsampling factors should evenly divide the grid in each dimension.
  2. An optional dependency on scikit-image is introduced only for users who wish to use this feature. If such dependency is undesirable, we could implement our own local averaging method.
  3. Test cases are yet to be added.

@ketch
Copy link
Member

ketch commented Feb 21, 2016

I have looked over this with @aymkhalil and it seems like a good first step. In terms of design, a downside is that it requires creating new Solution, State, and DA objects. But this leads to the advantage of not making any modification to the read and write functions themselves.

One important question is whether to keep the scikit-image optional dependency or replace it with some complicated numpy code (which seems like it would necessarily introduce a copy, at least in 3D).

An obvious next step after this is to allow different types of output (e.g. "checkpoints" and smaller ones) with different frequencies.

@mandli, any comments?

@mandli
Copy link
Member

mandli commented Feb 21, 2016

Reviewing, I will try to send some feedback by tonight. One thought though after reading through the PR once was that there are a couple of spots where there are deviations from Python coding styles which could be corrected easily. I will try to point these out with some code comments.

- ds_domain_ranges: global ranges of the downsampled solution
"""

from skimage.transform import downscale_local_mean
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some specifics as to what downscale_local_mean is doing? For instance, what the stencil size is? At the very least put in the doc-string that the function is from sci-kit image.

@mandli
Copy link
Member

mandli commented Feb 22, 2016

I think this generally looks good. I would suggest that tests be added since this is a fairly complex procedure, especially in parallel. I also wonder if this functionality should be implemented as a separate function that takes in a solution and then returns the specified downsampled solution. I only say this as most of the functionality ends up passing through self anyway and I could see some merit in making this its own module that we could implement a few different types of downsampling in. For instance if I wanted to downsample by simply striding through the array, it would be great to have a common interface to this functionality.

@aymkhalil
Copy link
Contributor Author

@mandli I'm working on the test cases and your comments and will push the new modifications soon.

Regarding your last comment about implementing the functionality in a separate module, I agree with you, however, I just wanted the downsampled version of the solution to stick around in memory as long as the original solution object lives to save the overhead of initializing the new DA objects should the write function be called several times during the simulation. Now implementing a different technique for downsampling will require some refactoring, I agree, but at least the average method (which is performed locally only) is already separate in the downscale_local_mean method which I'm working on replacing it with a custom code (most likely, in util.py).

Now this leaves us with _downsample_global_to_local which, handles ranges mapping between the original & the downsamped solution objects which kind of lies in the middle between constructing the additional DA objects and calling the local average function itself, I'll give it another thought and see if I can move this logic from the solution object. I really think this is reusable in any technique that depends on a "rolling window" type of downsampling.

Anyway, let's have another round of comments after I push my changes.

@mandli
Copy link
Member

mandli commented Feb 22, 2016

@aymkhalil You make some good points and the core routine, downscale_local_mean, is really the one we would most likely want to replace. As a particular test case of what we may want to implement take an application where the aux array's consistency with q is important. This happens in GeoClaw with bathymetry data and is pretty sensitive. For instance, using linear interpolation near shore can cause water to get where it's not supposed to be. Nothing to be particularly concerned with perhaps now but something to think about when writing the interface.

There is some concern I have regarding memory pressure but as that is usually not an issue for us I am not too worried about it.

@aymkhalil
Copy link
Contributor Author

  1. I pushed some test cases and added a couple of doc-strings.
  2. I couldn't find a better implementation than what's in ski-image for computing local average. What it does basically is first view the array to be downsampled as blocks of the size of the downsampling factors, then it applies the local averaging method (or any universal Numpy function).
    They key method is view as blocks:
>>> from skimage.util.shape import view_as_blocks
>>> a = np.arange(4*4).reshape(4,4)
>>> a
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

>>> view_as_blocks(a, (2, 2))
array([[[[ 0,  1],
         [ 4,  5]],

        [[ 2,  3],
         [ 6,  7]]],

       [[[ 8,  9],
         [12, 13]],

        [[10, 11],
         [14, 15]]]])

If the scikit-image dependency is undesirable, we could just copy the functions we need, e.g. the vew_as_blocks method:
https://github.com/scikit-image/scikit-image/blob/master/skimage/util/shape.py#L10-L104

Then we could do the following:

>>> a = view_as_blocks(a, (2, 2))
>>> a.mean(-1).mean(-1)
array([[  2.5,   4.5],
       [ 10.5,  12.5]])

However, other methods in scikit-image may come in really handy especially if we want to generalize this feature.

  1. Refactoring is required to support a generic downsampling feature. It would be great to allow the users to set their desired factors and method:
    claw.ds_factors = (2, 2, 2)
    claw.ds_method = 'local_mean'

Some methods may require non-overlapping blocks, overlapping windows, padding, etc. and this needs more discussion as to what methods are expected and what techniques they have in common to take this into account as @mandli has pointed. @ketch do you want this as part of this PR?

@mandli
Copy link
Member

mandli commented Feb 23, 2016

Thanks for putting in the tests, just a few more comments:

  • I think it's fine to depend on sci-kit image as long as we do not run afoul of Travis limits
  • Minor detail, can you make the lines that go past 80 columns wrap?
  • I think we could move on this PR while creating an issue to add flexibility as you suggest. As long as the changes needed would be mostly lower level and not involve the interface we should be fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants