Skip to content

PEP 14: PySAL Integrations

Levi John Wolf edited this page Dec 18, 2016 · 1 revision

PEP 14: PySAL Integrations

When PySAL gets feedback from users, there tends to be two main points.

  1. This isn't well integrated with the way I work with Python and the packages I use (beyond numpy).
  2. The documentation is comprehensive, but difficult to "discover." I've worked on a few related problems and had no clue that X or Y solution was implemented already in PySAL.

This is often driven by users who follow our example code. In that, weights are built directly from a shapefile, columns are pulled from the database in flat numpy arrays, and then everything is passed to computational classes. In the past, we've used this for example code since it only depends on the internally-consistent ecosystem of functions, data models, and classes we've built. And, we can do this, since we've built tools to get from IO to analysis and back. Presenting this angle exclusively, however, makes it appear like you have to use our stuff up and down the stack, rather than showing the library as a highly-integrated package.

Integrations are methods, functions, classes, or examples that show how a user might get into analysis from not-PySAL or out of analysis to not-PySAL.

Thus, we can directly address the first & try to resolve the second by showing off our integrations. A few ideas about how to both show our current integrations and make new ones consistent are discussed below.

  1. For new example code, try to prefer a pandas solution:
    • IO: pdio over open('filepath.shp').by_col_array()
    • masking data before regression/ESDA computations (pd.dropna(), pd.replace(np.nan, ...))
    • Construction of fixed-effects/regime weights (pd.get_dummies)
    • Instead of making a weights object and then post-processing it using Wsets, make the weights on the fly from a dataframe munged in Python. Subsetting to a list of IDs becomes constructing directly from that subset weights.Rook.from_dataframe(df.query('ID in @filterlist')). Intersection becomes constructing directly from an intersected dataset: weights.KNN.from_dataframe(pd.merge(df1, df2, how='inner'))
  2. Pushing for visibility/code contributions & integration where possible
  3. Continuing to improve and extend classmethods will help extend our API, like weights.Rook.from_dataframe. This means that, in quite a few cases, we can do new things without breaking the API directly. Classes can gain alternative "paths" into their __init__ function with .from_*:
>>> LMTests.from_statsmodels(my_statsmodels_regression.fit())
>>> GM_Het_Combo.from_formula('patsy ~ regressors')
>>> LISA_Markov.from_timeseries(time_indexed_pandas_series)
>>> W.from_networkx(my_special_weights)

Classes can continue to gain alternative "export" options with .to_*:

>>> ML_Lag(Y,X,W).to_statsmodels()#returns ResultsWrapper
>>> OLS(Y,X,W).to_file('./my_regression.txt') #writes summary out to file
>>> Moran_Local(X,W).to_frame() #returns a dataframe of I, p-value

Or add commonly-used visualizations directly to the classes using soft dependencies.

>>> Moran_Local(X,W).plot() #passes to a preconfig'd geoplot
>>> W.plot() #depends on matplotlib and bails if unavailable

This will let us extend the API without introducing breaking changes.