Skip to content

Google Summer of Code 2022

Martin Fleischmann edited this page Mar 24, 2022 · 17 revisions

Google Summer of Code 2022

PySAL is inviting students to join in PySAL's development by applying for Google Summer of Code 2022. This is the seventh year PySAL will be seeking to participate, and are submitting underneath the NumFocus Organization.

Introduction

PySAL is an open source library of spatial analysis functions written in Python intended to support the development of high level applications. See our documentation for more details. The developer guide describes in more details how to make contributions to PySAL and our work flow for contributing to the project. Our issues are also on github, which include bug reports and 'wishlist' items and enhancement plans and ideas.

If you are interested in participating in GSoC as a student, the best approach is to become an active and engaged contributor to the project right away. You should take a look at some of the existing issues on GitHub and see if there are any you think you might be able to take a crack at. Try submitting a pull request for something and start getting the hang of the process and interacting with the PySAL code base and development community. It is a good idea to start on your proposal early, post a draft to the pysal chat room and iterate based on the feedback you receive. This will not only improve the quality of your proposal, but also help you find a suitable mentor.

Project Ideas

Below is a listing of possible projects that students might consider. We also encourage students to propose their own projects, though several of the following topics are relatively high on our priority list. Our priority list is flexible, and it is important that the topic matches the interest and background of the student.

When considering the following projects, don't be put off by the knowledge prerequisites -- you don't need to be an expert, and there is some scope for research and learning within the GSoC period. However, familiarity with and interest in the subject area and involved technologies will be helpful!

Exploratory Spatial Data Analysis (ESDA) is one of the most important steps in understanding spatial data. There have been a range of statistics proposed over the years, with more on the horizon. However, there remains a need to implement these statistics in contemporary programming languages. PySAL is currently looking for a GSoC student that would be interested in implementing statistics like:

Difficulty Level:

dependent on the statistic, anywhere from easy to hard. Consult difficulty ratings on statistics themselves.

Expected outcomes:

User-class implementations of a set of the above statistics

Skills required/preferred:

  • knowledge of Python (required)
  • introductory statistics (required)
  • spatial statistics experience (preferred)

Mentors

  • Levi Wolf (@ljwolf)
  • Serge Rey (@sjsrey)

Project Size

Depends on the number and kind of estimators chosen.

2 estimators will take ~175 hours, 4+ will take ~350 hours.

Interfaces for consistent statistical analysis

Our packages for exploratory spatial data analysis (ESDA), spatial regression, and map classification are thoroughly-used parts of our library. The APIs for these different packages vary significantly, and can be challenging for new users. Setting out a clear and consistent pattern, such as that defined in Scikit-Learn, can make our library easier to compose, use, and extend. Thus, this project would accept students interested in exploring a code covenant for these packages. The code would primarily be interface, not statistical code, and we would be seeing students to

  1. separate estimation, validity-checking, and diagnostic code from our current clases
  2. co-develop (or adopt existing) consistent APIs for supervised and unsupervised geographical estimators
  3. implement these new classes using a mix-in strategy, as seen in Scikit-Learn, alongside the stripped-down functions for estimation.

Mentors

  • Levi Wolf (@ljwolf)
  • Taylor Oshan (@tayoshan)
  • Serge Rey (@sjsrey)

Project Size

350 hours

Expected outcomes

Formula interfaces and/or scikit-style estimators for classes across esda and spreg.

Skills required

  • familiarity with scikit-learn (required)
  • familiarity with R-like formula syntax (preferred)
  • experience using spatial statistical models and methods (preferred)

Difficulty

Hard, mainly due to both subject depth & breadth of changes required across the library.

Open Source Facility Location Modeling (spopt) Development

Facility location modeling is critically important for both public- and private-sector application, planning, and decision-making contexts. Currently, most of the facility location modeling has been implemented by commercial software such as the Location-Allocation tool in ArcGIS Network Analyst. Some of the open-source tools have been developed in both Python and R on this topic such as PySpatialOpt and Maxcovr. However, the comprehensive development of an open-source facility location modeling toolset is needed to reduce the barrier to implement facility location modeling in the future.

We had a very successful GSOC project for the spopt package from last year and we developed 4 basic models including LSCP, MCLP, P-median, and P-centre. This year PySAL is currently looking for a GSoC student that would be interested in further improve the spopt package to implement models like:

  • Backup Coverage Location Problem
  • p-Dispersion (dispersion model) under different neighborhood restriction and max-min-min dispersion
  • implement model components such as facility capacity, demand unit shape, distance metric, and solution approach by incorporating open-source GIS(networkX, geopandas, shapely)
  • see examples in example1, example2, and example3

Related Reading

Skills

  • interest in facility location modeling and spatial optimization
  • knowledge and experience with facility location modeling theories and optimization solvers are preferred

Difficulty Level

intermediate

Mentors

Project Size

350 hours

Expected outcomes:

Ready-to-use implementations of a subset of optimization models above

Skills:

  • knowledge of Python (required)
  • Linear algebra (preferred)
  • Linear programming (preferred)
  • Spatial Optimization (preferred)

Street network simplification

Broadly speaking, street network analysis can be split into two subgroups - transportation-focused and morphological. Although these subgroups frequently overlap, they each require slightly different input data. The former often needs detailed transportation geometry, e.g. roundabouts, dual carriageways, etc. The latter needs a network cleared of such structures. The issue is that the data (like street networks from OpenStreetMap) often follow the first model and complicates most morphological analysis. PySAL's momepy, a module focusing on morphology, currently lacks advanced network preprocessing methods that would simplify the transportation-based network to a morphological one.

The successful GSoC project will develop tools within the momepy.preprocessing module that enable parametrized simplification and topological correction of street networks, dealing with node and edge consolidations. Furthermore, the method should be source-agnostic (unlike some simplification methods in OSMnx depending on OSM tags) to allow wide applicability.

Required/Preferred Skills

  • Familiarity with GeoPandas data structures (GeoSeries, GeoDataFrame) and NetworkX graphs (required)

Difficulty level

intermediate / advanced

Mentors

Project size

~350 hours

Resources

Some of the existing tools offer partial funcionality:

Additional resources:

Expected outcomes:

A set of functions providing a parametrized simplification and topological correction of street networks.

Other

PySAL is an open source project and as such we invite contributions from any interested developer. If you have an idea for an enhancement for PySAL please contact one of the developers to discuss the possibilities for the project in GSOC22.

Clone this wiki locally