DOC: write out a prose narrative of the proposed design #14

tacaswell · 2022-11-03T16:50:35Z

No description provided.

docs/source/design.rst

tacaswell · 2022-11-03T17:42:09Z

I also discovered that mathjax does not work with singlehtml (it builds cleanl, adds the math tags etc but does not add mathjax to the output file) and I was getting local build issues due to the mpl-sphinx-theme.

We should decide if want to:

want to not use any math in the docs
switch to normal html (which it might be getting long enough we need to do that anyway)
fix the mathjax extension to work with singlehtml

tacaswell · 2022-11-03T19:21:29Z

Other relevant links:

https://gist.github.com/tacaswell/0e0f85433a0462d90aaf5d1e81486c8f a nearly minimal but functional plotting tool built on our backends. I think this may be a useful starting point to build out a "parable of git" style documentation or https://nsls-ii.github.io/bluesky/run_engine.html as the existing code has too much error handling, complicating features, optomizations, etc on it to be super clear
https://gist.github.com/tacaswell/95177903175dbc28be5353b4a0e5118f <- very early prototype / sketch of DataContainers (call "Sources" in that version). Signature is different (get vs query, no caching) and directly sub-classes existing Artists
https://paper.dropbox.com/doc/Matplotlib-4.0--BsQbaDIrvAh2ioCzz80zsEkZAQ-WTYwd0NQaSHTjtLUZwkNx old draft of a "mpl 4" proposal that feed into the grant proposals

story645

mostly grammar/spelling

docs/source/design.rst

story645 · 2022-11-03T20:51:18Z

docs/source/design.rst

+
+By accessing all of the data that is needed in draw in a single function call
+the ``DataContainer`` instances can ensure that the data is coherent and
+consistent.  This is important for applications like steaming where different


Suggested change

consistent. This is important for applications like steaming where different

consistent. This is important for applications like streaming where different

story645 · 2022-11-03T20:55:46Z

docs/source/design.rst

+the backend renderer understand.  This can range from coordinate
+transformations (as with the ``Transfrom`` stack operations on *x* and *y* like
+values), representation conversions (like named colors to RGB values), mapping
+stings to a set of objects (like named markershape), to paraaterized type


Suggested change

stings to a set of objects (like named markershape), to paraaterized type

stings to a set of objects (like named markershape), to parameterized type

story645 · 2022-11-03T20:57:53Z

docs/source/design.rst

+This will open up paths to do a number of nice things such as multi-variate
+color maps, lines who's width and color vary along their length, constant but
+parameterized colors and linestyles, and a version of ``scatter`` where the
+marker shape depends on the data.  All of these things are currently possible
+in Matplotlib, but require significant work before calling Matplotlib and can
+be very difficult to update after the fact.
+


what about explicitly using the language of vectorizing color, shape, size, etc mapping since that's the language we've been using internally?

story645 · 2022-11-03T20:59:05Z

docs/source/design.rst

+route of hashing all of the data is not a sustainable path (in the case even
+modestly sized data the time to hash the data will quickly out-strip any
+possible time savings doing the cache lookup!).  The proposed ``query`` method


in this case?

story645 · 2022-11-03T20:59:39Z

docs/source/design.rst

+the transform and size passed in.
+
+The choice to return the data and cache key in one step, rather than be a two
+step process is drive by simplicity and because the cache key is computed


Suggested change

step process is drive by simplicity and because the cache key is computed

step process is driven by simplicity and because the cache key is computed

story645 · 2022-11-03T21:00:03Z

docs/source/design.rst

+The choice to return the data and cache key in one step, rather than be a two
+step process is drive by simplicity and because the cache key is computed
+inside of the ``query`` call.  If computing the cache key is fast and the data
+to be returned in "reasonable" for the machine Matplotlib is running on (it


Suggested change

to be returned in "reasonable" for the machine Matplotlib is running on (it

to be returned is "reasonable" for the machine Matplotlib is running on (it

data_prototype/wrappers.py

docs/source/design.rst

jklymak · 2022-11-04T14:23:50Z

This is super helpful for me.

A lot of this design has to do with holding numpy arrays, and being able to subsample them on the fly, and carry some meta information around about the numpy arrays. I think that is great and will be a huge step forward for us. Where I'm still unclear is our interface with other data types and units.

How do we tell downstream or adjacent libraries how we will decode their objects, both just to get the numpy array, and to do unit conversion? Specifying this seems pretty key to me, and maybe is beyond just our remit. Pandas, xarray, have things like obj.to_numpy(), obj.values, and I think they have __array__ representations. I think we largely expect np.asarray(obj) to work. But that kills units.

Unit support is confusing as well. I think the confusion comes from whether elements of arrays should have units and that is where we detect the dtype, or whether the array itself should have the units. I feel we should put our foot down: I think arrays should have units, not array elements. I think that leaves lists of pint objects out in the cold, and I'm not even sure it works with the jpl_units interface, but I pretty strongly feel there is no reason to have arrays with mixed units, and that parsing unit information from either the array or its elements is far too confusing.

However, that has the same issue in that we need a standard for how arrays hold their unit information, or how we can access it. Or we just natively support object arrays (categorical) and np.datetime64 arrays (dates) and let downstream libraries write their own converters like we do now. However, if we do that, do we also make the conversion interface the way the downstream libraries extract the numpy array? Then we don't need to guess how they turn their object into an array. Note, that we don't do this presently.

You have probably thought through some of the above, but I think it's hard because unlike much of the rest of this proposal, it involves the interface with other projects, and probably the conversation should start soon.

Co-authored-by: Kyle Sunden <git@ksunden.space>

tacaswell · 2022-11-15T23:29:04Z

I think that is great and will be a huge step forward for us. Where I'm still unclear is our interface with other data types and units.

Everything goes under the DataContainer wrapper so it has a consistent API. We have not gotten to the design of the helper-functions to make fabricating these objects easy yet. I think one of the persistent issues with Matplotlib as it is now is that too much of the "lets make it easy to use" leaked down into the core of the library which makes it overly complicated and both harder to use and maintain in the long term.

Unit support is confusing as well. I think the confusion comes from whether elements of arrays should have units and that is where we detect the dtype, or whether the array itself should have the units.

I think this is primarily a problem with the auto-detection of units. Once we know how to handle a given input we should no care how the units are actually carried. That said, I have hopes that the numpy dytpe work will flatten much of the diversity in this space from our point of view.

jklymak · 2022-11-15T23:52:31Z

Everything goes under the DataContainer wrapper so it has a consistent API. We have not gotten to the design of the helper-functions to make fabricating these objects easy yet.

I guess thats what I'm asking: who makes the DataContainer? I guess the way we do things now is that we ask the converter to do it, so I assume this would be the same, but now the downstream library would provide the DataContainer implementation? Are they going to need to make a different Container for each plot type they want to support?

It is now too long.

ksunden reviewed Nov 3, 2022

View reviewed changes