Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Include the original vertical coordinate as data variable in the transformed Dataset #617

Open
mgorfer opened this issue Sep 8, 2023 · 0 comments

Comments

@mgorfer
Copy link

mgorfer commented Sep 8, 2023

I have an input ds on a vertical pressure coordinate:

<xarray.Dataset>
Dimensions:            (time: 12, pressure: 37, latitude_bins: 36,
                        longitude_bins: 72)
Coordinates:
  * pressure           (pressure) int32 1 2 3 5 7 10 ... 900 925 950 975 1000
  * time               (time) datetime64[ns] 2000-01-01 ... 2000-12-01
  * longitude_bins     (longitude_bins) float64 -177.5 -172.5 ... 172.5 177.5
  * latitude_bins      (latitude_bins) float64 -87.5 -82.5 -77.5 ... 82.5 87.5
Data variables:
    specific_humidity  (time, pressure, latitude_bins, longitude_bins) float32 dask.array<chunksize=(12, 37, 36, 72), meta=np.ndarray>
    temperature        (time, pressure, latitude_bins, longitude_bins) float32 dask.array<chunksize=(12, 37, 36, 72), meta=np.ndarray>
    altitude           (time, pressure, latitude_bins, longitude_bins) float32 dask.array<chunksize=(12, 37, 36, 72), meta=np.ndarray>

I want to transform it to altitude, but also keep the pressure variable in my output. It should look like this then:

<xarray.Dataset>
Dimensions:            (time: 12, longitude_bins: 72, latitude_bins: 36,
                        altitude: 800)
Coordinates:
  * time               (time) datetime64[ns] 2000-01-01 ... 2000-12-01
  * longitude_bins     (longitude_bins) float64 -177.5 -172.5 ... 172.5 177.5
  * latitude_bins      (latitude_bins) float64 -87.5 -82.5 -77.5 ... 82.5 87.5
  * altitude           (altitude) int64 0 100 200 300 ... 79700 79800 79900
Data variables:
    specific_humidity  (time, latitude_bins, longitude_bins, altitude) float32 dask.array<chunksize=(12, 36, 72, 800), meta=np.ndarray>
    temperature        (time, latitude_bins, longitude_bins, altitude) float32 dask.array<chunksize=(12, 36, 72, 800), meta=np.ndarray>
    pressure           (time, latitude_bins, longitude_bins, altitude) float64 dask.array<chunksize=(12, 36, 72, 800), meta=np.ndarray>

Normally, xgcm does not keep the original vertical coordinate in the output. But I need all the data variables and pressure as data variable on a vertical altitude coordinate in my output.

I accomplished this by creating a (WIP) function:

import logging
import numpy as np
from xgcm import Grid

logger = logging.getLogger(__name__)

# Set the default target resolution for transforming to altitude.
TARGET_RESOLUTION = np.arange(0, 80000, 100)

def xgmc_transformation(
    ds: xr.Dataset,
    orig_coord: str = "pressure",
    target_coord: str = "altitude",
    target_resolution: np.ndarray = TARGET_RESOLUTION,
    manual_method: str = "auto",
) -> xr.Dataset:
    """
    Apply a coordinate transformation to the dataset.

    Parameters
    ----------
    ds : xr.Dataset
        The input dataset.
    orig_coord : str, optional
        The original coordinate to transform from. Default is "pressure".
    target_coord : str, optional
        The target coordinate to transform to. Default is "altitude".
    target_resolution :  np.ndarray, optional
        The target resolution to transform to. Default is np.arange(0, 80000, 100).
    manual_method : str, optional (WIP!)
        The chosen manual method for the transformation. Either lin or log.
        Default is auto. If the original vertical coordinate is pressure, auto chooses log.
        In this case, pressure will be prepared for the log/lin interpolation by np.log(pressure).
        It will get antilogged after the transformation. 
        If the original coordinate is altitude, no log is applied. (FIX THIS!)

    Returns
    -------
    xr.Dataset
        The transformed dataset.
    """
    if manual_method != "auto":
        method = manual_method
    elif orig_coord == "altitude":
        method = "linear"
    elif orig_coord == "pressure":
        method = "log"
    else:
        msg = "Method could not be determined"
        raise ValueError(msg)

    if method == "log":
        ds[orig_coord] = np.log(ds[orig_coord])

    logger.info(
        "Tranform Dataset from %s to %s using %s transform",
        orig_coord,
        target_coord,
        method,
    )

    # Add orig_coord along target_coord in the dataset, as it should be in the output
    ds[f"{orig_coord}_var"] = ds[orig_coord].broadcast_like(ds[target_coord])

    grid = Grid(ds, coords={"Z": {"center": orig_coord}}, periodic=False)

    transform_vars = [var_ for var_ in ds.data_vars if var_ != target_coord]

    transform_list = []
    for var_ in transform_vars:
        var_transformed = grid.transform(
            ds[var_],
            "Z",
            target_resolution,
            target_data=ds[target_coord],
        )
        transform_list.append(var_transformed)
    out = xr.merge(transform_list)

    # Rename orig_coord to the original name.
    out = out.rename({f"{orig_coord}_var": orig_coord})

    if method == "log":
        out[orig_coord] = np.exp(out[orig_coord])

    return out

I am not sure if this is an especially elegant approach, especially the part where I do ds[f"{orig_coord}_var"] = ds[orig_coord].broadcast_like(ds[target_coord]), to insert pressure again into the ds as data variable along altitude.

Is what I want to accomplish maybe already implemented in some way in xgcm?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant