Individual Plans H1 2023
This page is a place for @bokeh/core to collect high-level thoughts and informal plans for things they expect or hope to be able to work on in the next ~6 months.
There is historical design cruft around the division between some glyphs and some annotations. I would like to achieve as a group, a consistent plan forward that will clear up the current situation (with any necessary deprecations), as well as guide future work when new drawing features are to be added. A specific outcome is to clearly enunciate questions like:
- what are firm technical differences between glyphs and annotations
- what features and APIs should all annotations support (are there sub-categories?)
- what are consistent naming conventions we can adopt now and carry into future work
Keeping in mind that sometimes things may fall into both categories, e.g. we may want a Text
glyph, and also a vectorized Labels
(since the term "label" has widespread significance and meaning within the dataviz world). They should share as much implementation as possible, but also clearly demonstrate the conceptual intentions behind "glyph" and "annotation" individually.
This work is primarily planning and discussion, but eventually someone(s) will do work around the glyph and annotations modules of both Bokeh and BokehJS.
-
revamp example handling
I think we will want to go back to having a custom parser for
.py
files. Now that examples have been consolidated underexamples
we can point the custom parser there, and have it pre-process all the files up front. This will allow us to have better control over caching, etc. and to avoid re-evaluating examples unnecessarily.We may also want to explore options like switching to JSON embeds rather than relying on old and complicated
autoload_static
so much. -
speed up docs build
Our docs currently take very long to build. The examples work described above may help improve things, but is probably not the entire story.
-
auto API linking
Example code in our docs should auto-link
There are some existing tools that almost work out of the box, at least for functions and classes, etc. But realistically we would like auto-links to our model properties as well. This may necessitate developing a custom Sphinx extension yet again.
Any work should be mostly confined to bokeh.sphinxext
and docs/bokeh
in the main repo. However, it is possible we may want to add hooks into the model system to facilitate API auto linking.
The demo site was recently re-built using terraform. A full infrastructure tear-down and setup using terraform requires highly elevated AWS credentials. However, the current overall infrastructure really should not need to change at this point. Instead, it should be possible to simply swap out new containers for new Bokeh versions into the existing infrastructure. For this smaller task, a well-scoped set of minimal credentials can be determined and automation set up so that anyone can kick off to update the site when needed.
Any work should be confined to https://github.com/bokeh/demo.bokeh.org and AWS and I do not anticipate any impact for any work ongoing in the main repo.
-
Integration tests
Our integration tests are currently disabled due to flakiness. Some level of full end-to-end cross-runtime testing capability needs to be restored.
-
Notebook tests
We are ten years overdue for having any automated testing that actually exercises real notebooks. It's not clear what the best approach is, so some exploration and discussion will be necessary.
I would expect work to be mostly contained under bokeh/tests
and .github/workflows
We have a working implementation of contour plots, but improvements are needed to make it really useful. These ideas were originally in the Contouring Roadmap discussion.
-
Automatic calculation of contour levels. User specifies the number of levels required and they are calculated based on the supplied data limits. There are possibilities for linearly and logarithmically spaced levels, and linearly symmetric about zero, maybe more later. User may not receive exactly the requested number of levels as the requirement that the levels are sensibly spaced is more important.
-
Ways of specifying vector visual properties without knowing the number of levels in advance. This applies to all
fill
andline
visual properties, and there will be extra palette-specific possibilities forfill_color
andline_color
. I am expecting the validation and calculation of these properties to occur on the Python side as all other contour validation and calculation occurs there. -
Extending colorbar above and below the
level
range for filled contours. Before this, there are always contour lines calculated and drawn at the lower and upperlevel
limits. If you "extend above" there will be an additional set of filled polygons from the maximumlevel
upwards. If you "extend below" there will be an additional set of filled polygons from the minimumlevel
downwards. This needs a sensible API and a way of indicating it visually on a colorbar, e.g through the use of filled triangles at the upper and/or lower limits.
These ideas are mostly from the WebGL Roadmap discussion.
-
Improved single line rendering. I have an idea for a different approach in the line shaders that could be both simpler code and faster to run. It should also address some of the current dashed line limitations.
-
Dashed line support for markers, meaning fixed-shape glyphs. Initially I am only considering some of the more commonly used and easy to implement shapes such as squares and circles.
-
Multiline
glyph. The WebGL part of this is fairly easy, but it also requires a restructuring of the BokehJS rendering loop so that a singlerender
call can blit and clear the WebGL canvas multiple times. -
Arbitrary area glyphs, including with holes in them. This is going to need JavaScript or WASM tesselation/triangulation functionality.
-
Mechanisms to minify the shader code, and allow insertion of code into shaders. For example, we only want a single copy of the code that draws the various hatch patterns, and this needs to be used in both marker and arbitrary area shaders.
Note that I am not aiming here for full WebGL support.
I'd like documentation on BokehJS to be auto-generated, whether that is standalone or part of the sphinx doc build system. Primarily my interest is to provide contributors with better information on the API including the availability or not of handy utilities such as ndarray
classes and functions. But it should also be useful for users who are interesting in accessing BokehJS functionality through extensions and/or callbacks.
Add support for:
- multiple plots per canvas and arbitrary plot positioning
- layouts of legends, color bars and possibly other annotations
- legends, color bars and plot independent layouts on separate canvases
Allow all data intensive computations (set_data
, map_data
, hit testing, etc.) and painting
(using offscreen canvas where applicable) to be performed off the main thread in dedicated web
workers. In the longer term, consider performing all data processing in web workers, including
receiving data through web sockets. Focus on making bokehjs' UI more responsive when handling
large amounts of data. This work needs to be coordinated with adding support for web assembly
and bokehjs packaging reorganization.
Introduce web assembly to bokehjs, to allow using tooling and a programming language better suited for handling data. A language and a respective tool chain need to be chosen. Currently Rust is under evaluation in PR #12961. Similarly to multi-threading support, consider re-implementation of data processing logic. Consider utilization of SIMD where applicable. Add support for 64-bit integer arrays, handle complex dtypes and non-native number types (e.g. fixed point arithmetics). Generally improve support for handling ndarrays. Consider supporting other data/array formats (e.g. arrow).
Rethink how we build, package and bundle bokehjs. Specifically reduce or completely eliminate
usage of tsc's compiler APIs in favour of a faster tool chain (e.g. swc).
Split bokehjs into smaller self-contained packages. Split-off plotting/vis code into its
own bundle. Finalize support for ESM bundles. Investigate alternative bundling schemes
(e.g. use scope hoisting and eliminate Bokeh.require()
).
Add support for validating of the shape and types of data whether it matches with capabilities of associated glyphs'. Currently it's impossible to verify whether the supplied data makes sense, at least not until bokehjs tries to process it. Allow to report many or all issues during validation, instead of quitting on the first one. This principle should be applied more generally across bokeh and bokehjs.
Separately, make bokehjs more error resilient by not allowing unhandled exceptions. Any such exceptions should only indicate bugs and not usage error. All usage errors should be either recoverable or presented in the UI with an explanation how to recover from them manually.
Create a JSON schema for bokeh's protocol and in general for any JSON generated by bokeh (base schema). In the long term, create tool for generating schema for all models and their properties (detailed schema). Having a schema for the protocol should make it easier to create new tools targeting bokeh's protocol and bokehjs.
- Continuously improve readability and structure
- Update to most recent version of Sphinx (currently pinned to 5.1) and theme (currently pinned to 0.9)
- Help with automating documentation
- Keep improving the new tutorial https://github.com/bokeh/tutorial
- Potentially record tutorial videos of some or all chapters
- Coordinate participation in Outreachy's May 2023 round
- Coordinate presentations and sprints at conferences this year, esp. SciPy US
- Coordinate the call for JS-focused Bokeh core-dev
- Work with Victoria on regular social and blog posts
- Work on proposals for improving Bokeh's accessibility, we can apply to NASA's HPOSS grant and the next CZI EOSS round.
- Update
bokeh/pm
repository - Create and publish a Privacy Policy
- Look into setting up a different analytics tool (move away from GA)