Skip to content
Sergio Rey edited this page Dec 13, 2016 · 4 revisions

PEP 12: Refactor PySAL Code Base

PEP 12
Title Refactor PySAL Code Base
Author Serge Rey
Status Draft
Type Infrastructure
Created 2016-11-28
Post-History 2016-11-28

Contents

Motivation

PySAL is approaching its 7 year birthday. We have come a great distance, added many awesome features to the library, and have much to be proud of since July 2010. Equally important is that the vision for PySAL at its birth remains today, namely to bring advanced spatial analytics to the Python ecosystem, and do so with the dual goals of transparency to support education/training of the next generation of spatial scientists and to implement state of the science methods that address unmet or under-served needs.

Over time, however, there has been tension between these two goals as transparency and pedagogy have led the project to elevate simple installation over the leveraging of powerful, but at times difficult to install, third-party libraries. This has been behind the criticism that PySAL had/has a 'not invented here' mentality. From the perspective of today's development environment where there is an abundance of rich libraries available to chooose from, this has a ring of truth to it. Yet, from the longer view, this critique misses the mark badly, as implementing, for example, our own shapereader happened in an era when there was "no there, there". We had to do it as there was nothing else available.

Things have no doubt improved since 2010 in terms of the ecosystem of libraries for spatial data processing. Yet, there are still pain points to be encountered when trying to exploit these packages. Whenever we have faced a trade-off between adding the latest and greatest library at the cost of increased installation headache for our users, we have chosen the latter over the former.

Putting aside the trade-off between the goals of pedagogy and speed/efficiency, the code base has grown in such a way that the the project is facing significant increases in both test time and developer costs associated with the implementation of new features and optimizations. In short, the maintenance costs are impinging on advances. The opaque nature of the code base, documentation, and testing frameworks also present formidable barriers to entry for new developers. New developers today will be core developers of tomorrow. Our processes may be suboptimal for such a transition.

PySAL has also benefited enormously from the contributions of its talented developer team, yet because of the structure of the code base and its bifurcation into "core" and "contrib" modules, these contributions often are not given the recognition that they truly deserve. This structure also limits a broader sense of ownership in the project.

Thus, it seems a good time to take stock and consider if a restructuring of the code base can be carried out that addresses these concerns with an eye towards an even brighter future for the project.

In what follows some candidate approaches to refactoring PySAL are put forth. A number of questions that arise in considering the general refactoring, as well as the specific approaches/strategies, are also highlighted.

Approaches

At this point in time, the BDFL sees three possible paths forward:

  1. Jupyter model
  2. Monolithic
  3. Status quo

Jupyter model

The transition of the iPython project into the Jupyter project provides an impressive example of a major code base that underwent substantial refactoring, and not only lived but came out stronger for the effort. Learning from this experience would seem a wise route to travel in our consideration of how/if to refactor PySAL.

  • jupyter (meta-package that installs)
    • notebook
    • qtconsole
    • jupyter_console
    • nbconvert
    • ipkernel
    • ipwidgets

All, but the last two, of the repositories are at github.com/Jupter

The packages that the meta-package installs can each have their own dependencies, and some of these, such as jupyter_core), are shared across the packages.

How might this look for PySAL

A rough sketch is as follows:

  • pysal (becomes a meta project)

    • esda
    • mapclassification
    • spreg
    • spatial_dynamics
    • regionalization
    • inequality
    • viz
    • spint
    • points
  • pysal_core

    • io
    • core
    • pdio
    • weights

The packages provided by the one-stop-shopping meta package pysal would each potentially have pysal_core as a dependency. In other words, pysal_core would be where we place the common spatial processing components. All the analytics are in their own packages that consume these processing functions.

Each of the packages listed underneath the meta-packge would also be installable on their own, thus allowing advanced developers/users to pick and choose for a lean/customized PySAL installation.

This does not cover all the packages currently in contrib. We would have to decide where they fit in this (in pysal-core as as their own packages ala spint/spreg).

Each package would have its own repository on the PySAL organization github account.

Each of the packages would have a lead maintainer who would be responsible for that code base and its integration with the larger project.

We would likely benefit from adopting a similar directory structure across each of the packages, as well as guidelines and practices for testing and documentation.

Benefits

  • greater attribution and credit for the maintainer
  • broader ownership of the project
  • relaxing of dependency restrictions
  • modularization of the code base
  • shorter test cycles

Costs

  • increased coordination costs
  • potentially more difficult for end-user installation
  • potential loss of backward compatibility

Monolithic Contrib Integration

An alternative path is to maintain a single, monolithic, repository that removes the contrib distinction by integrating the existing modules under the current contrib into the top level directory of the project.

Benefits

  • relaxing of dependency restrictions

Costs

  • monolithic nature of code base not addressed
  • potentially more difficult for end-user installation
  • long test cycles for pull requests

Status quo

We always have the option to keep on with what we are doing if, after a full consideration of the issues and alternatives, we feel this is the best path forward. While we are fully aware of the pain points with the current structure, this comes from a lot of hard earned experience. The other two suggested alternatives may have pain points we have yet to experience.

Benefits

  • backwards compatibility maintained
  • the devil we know
  • advanced users can benefit from contrib

Costs

  • pain points remain (see Motivation)

Questions/Concerns

  • User experience
  • Backwards compatibility
  • Preserving commit and project history
  • Testing
  • Release process
  • Governance
  • documentation
    • standards
    • repos specific

Proposed Schedule

Having explored some of the issues around refactoring, the BDFL proposes the following schedule.

  • December 2016
    • Discussion of draft proposal
    • Identification of possible alternatives
    • Identification of other possible issues
    • Refinement of proposal
      • Proposal of alternatives
  • January 2017
    • Vote on alternative options regarding refactoring
    • Formation of teams to explore top two proposals
  • February-March 2017
    • Exploration of alternative implementations
    • Monthly reports to the dev team on progress/issues
  • April 2017
    • Vote on final refactoring question

This reflects the expectation that not all the issues have been identified at this point, the devil is always in the details, and other members of the dev team have not had the opportunity to provide insights. There could also be other alternatives, or modifications to the three approaches above, that we should consider.

This is admittedly conservative and is only intended to fuel the discussion. If the consensus is that we can move forward in a shorter time period, we will do that.