Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: write out a prose narrative of the proposed design #14

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

tacaswell
Copy link
Member

No description provided.

docs/source/design.rst Outdated Show resolved Hide resolved
@tacaswell
Copy link
Member Author

I also discovered that mathjax does not work with singlehtml (it builds cleanl, adds the math tags etc but does not add mathjax to the output file) and I was getting local build issues due to the mpl-sphinx-theme.

We should decide if want to:

  1. want to not use any math in the docs
  2. switch to normal html (which it might be getting long enough we need to do that anyway)
  3. fix the mathjax extension to work with singlehtml

@tacaswell
Copy link
Member Author

Other relevant links:

Copy link
Member

@story645 story645 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly grammar/spelling

docs/source/design.rst Outdated Show resolved Hide resolved
docs/source/design.rst Outdated Show resolved Hide resolved
docs/source/design.rst Outdated Show resolved Hide resolved
docs/source/design.rst Outdated Show resolved Hide resolved

By accessing all of the data that is needed in draw in a single function call
the ``DataContainer`` instances can ensure that the data is coherent and
consistent. This is important for applications like steaming where different
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
consistent. This is important for applications like steaming where different
consistent. This is important for applications like streaming where different

the backend renderer understand. This can range from coordinate
transformations (as with the ``Transfrom`` stack operations on *x* and *y* like
values), representation conversions (like named colors to RGB values), mapping
stings to a set of objects (like named markershape), to paraaterized type
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
stings to a set of objects (like named markershape), to paraaterized type
stings to a set of objects (like named markershape), to parameterized type

Comment on lines 196 to 216
This will open up paths to do a number of nice things such as multi-variate
color maps, lines who's width and color vary along their length, constant but
parameterized colors and linestyles, and a version of ``scatter`` where the
marker shape depends on the data. All of these things are currently possible
in Matplotlib, but require significant work before calling Matplotlib and can
be very difficult to update after the fact.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about explicitly using the language of vectorizing color, shape, size, etc mapping since that's the language we've been using internally?

Comment on lines 220 to 236
route of hashing all of the data is not a sustainable path (in the case even
modestly sized data the time to hash the data will quickly out-strip any
possible time savings doing the cache lookup!). The proposed ``query`` method
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in this case?

the transform and size passed in.

The choice to return the data and cache key in one step, rather than be a two
step process is drive by simplicity and because the cache key is computed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
step process is drive by simplicity and because the cache key is computed
step process is driven by simplicity and because the cache key is computed

The choice to return the data and cache key in one step, rather than be a two
step process is drive by simplicity and because the cache key is computed
inside of the ``query`` call. If computing the cache key is fast and the data
to be returned in "reasonable" for the machine Matplotlib is running on (it
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
to be returned in "reasonable" for the machine Matplotlib is running on (it
to be returned is "reasonable" for the machine Matplotlib is running on (it

data_prototype/wrappers.py Outdated Show resolved Hide resolved
docs/source/design.rst Outdated Show resolved Hide resolved
docs/source/design.rst Outdated Show resolved Hide resolved
docs/source/design.rst Outdated Show resolved Hide resolved
docs/source/design.rst Outdated Show resolved Hide resolved
@jklymak
Copy link
Member

jklymak commented Nov 4, 2022

This is super helpful for me.

A lot of this design has to do with holding numpy arrays, and being able to subsample them on the fly, and carry some meta information around about the numpy arrays. I think that is great and will be a huge step forward for us. Where I'm still unclear is our interface with other data types and units.

How do we tell downstream or adjacent libraries how we will decode their objects, both just to get the numpy array, and to do unit conversion? Specifying this seems pretty key to me, and maybe is beyond just our remit. Pandas, xarray, have things like obj.to_numpy(), obj.values, and I think they have __array__ representations. I think we largely expect np.asarray(obj) to work. But that kills units.

Unit support is confusing as well. I think the confusion comes from whether elements of arrays should have units and that is where we detect the dtype, or whether the array itself should have the units. I feel we should put our foot down: I think arrays should have units, not array elements. I think that leaves lists of pint objects out in the cold, and I'm not even sure it works with the jpl_units interface, but I pretty strongly feel there is no reason to have arrays with mixed units, and that parsing unit information from either the array or its elements is far too confusing.

However, that has the same issue in that we need a standard for how arrays hold their unit information, or how we can access it. Or we just natively support object arrays (categorical) and np.datetime64 arrays (dates) and let downstream libraries write their own converters like we do now. However, if we do that, do we also make the conversion interface the way the downstream libraries extract the numpy array? Then we don't need to guess how they turn their object into an array. Note, that we don't do this presently.

You have probably thought through some of the above, but I think it's hard because unlike much of the rest of this proposal, it involves the interface with other projects, and probably the conversation should start soon.

@tacaswell tacaswell force-pushed the doc_design branch 2 times, most recently from 6301985 to 25f02ba Compare November 15, 2022 23:09
@tacaswell
Copy link
Member Author

I think that is great and will be a huge step forward for us. Where I'm still unclear is our interface with other data types and units.

Everything goes under the DataContainer wrapper so it has a consistent API. We have not gotten to the design of the helper-functions to make fabricating these objects easy yet. I think one of the persistent issues with Matplotlib as it is now is that too much of the "lets make it easy to use" leaked down into the core of the library which makes it overly complicated and both harder to use and maintain in the long term.

Unit support is confusing as well. I think the confusion comes from whether elements of arrays should have units and that is where we detect the dtype, or whether the array itself should have the units.

I think this is primarily a problem with the auto-detection of units. Once we know how to handle a given input we should no care how the units are actually carried. That said, I have hopes that the numpy dytpe work will flatten much of the diversity in this space from our point of view.

@jklymak
Copy link
Member

jklymak commented Nov 15, 2022

Everything goes under the DataContainer wrapper so it has a consistent API. We have not gotten to the design of the helper-functions to make fabricating these objects easy yet.

I guess thats what I'm asking: who makes the DataContainer? I guess the way we do things now is that we ask the converter to do it, so I assume this would be the same, but now the downstream library would provide the DataContainer implementation? Are they going to need to make a different Container for each plot type they want to support?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants