Skip to content

Commit

Permalink
Merge pull request #252 from jGaboardi/final_proof
Browse files Browse the repository at this point in the history
a 'final' proofing for JOSS manuscript
  • Loading branch information
jGaboardi committed Jun 13, 2022
2 parents 2351de9 + 64f0e9d commit abc3a76
Show file tree
Hide file tree
Showing 2 changed files with 19 additions and 7 deletions.
12 changes: 12 additions & 0 deletions paper/paper.bib
Original file line number Diff line number Diff line change
Expand Up @@ -305,3 +305,15 @@ @misc{maxcovr
url = {https://github.com/njtierney/maxcovr},
}

@article{scikitlearn,
title={{Scikit-learn: Machine Learning in Python}},
author={Pedregosa, F. and Varoquaux, G. and Gramfort, A. and Michel, V.
and Thirion, B. and Grisel, O. and Blondel, M. and Prettenhofer, P.
and Weiss, R. and Dubourg, V. and Vanderplas, J. and Passos, A. and
Cournapeau, D. and Brucher, M. and Perrot, M. and Duchesnay, E.},
journal={Journal of Machine Learning Research},
volume={12},
pages={2825--2830},
year={2011},
url={https://www.jmlr.org/papers/v12/pedregosa11a.html}
}
14 changes: 7 additions & 7 deletions paper/paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,12 +54,12 @@ Spatial optimization is a major spatial analytical tool in management and planni

# Statement of need

Spatial optimization methods/algorithms can be accessed in many ways. ArcGIS[^arc] and TransCAD[^transcad] are two well-known commercial GIS software packages that provide modules designed for structuring and solving spatial optimization problems. The optimization functions they offer focus on a set classical single facility location methods (e.g., Weber, Median, Centroid, 1-center), routing and shortest path methods (e.g., shortest path on the network, least cost path over the terrain), and multi-facility location-allocation methods (e.g., coverage models, p-median problem). They are user-friendly and visually appealing, but the cost is relatively high [@murray2021contemporary].
Spatial optimization methods and algorithms can be accessed in many ways. ArcGIS[^arc] and TransCAD[^transcad] are two well-known commercial GIS software packages that provide modules designed for structuring and solving spatial optimization problems. The optimization functions they offer focus on a set classical single facility location methods (e.g., Weber, Median, Centroid, 1-center), routing and shortest path methods (e.g., shortest path on the network, least cost path over the terrain), and multi-facility location-allocation methods (e.g., coverage models, p-median problem). They are user-friendly and visually appealing, but the cost is relatively high [@murray2021contemporary].

[^arc]: https://www.esri.com/en-us/home
[^transcad]: https://www.caliper.com/

Open-source software is another option to access spatial optimization. Although it may require users to have a certain level of programming experience, open-source software provides relatively novel and comprehensive methods, and more importantly, it is free and can be easily replicated. This is particularly true for regionalization and facility-location methods. Regionalization methods are limited in commercial GIS software, and may only have grouping analysis for vector data and region identification for raster data. On the contrary, there are many application-oriented open-source packages that facilitate the implementation of regionalization methods in various fields, including climate (e.g., HiClimR [@badr2015tool], synoptReg [@LEMUSCANOVAS2019114]), biology (e.g., Phyloregion [@daru2020phyloregion], regioneR [@regioneR2015]), hydrology (e.g., nsRFA [@nsRFA]), agricultural (e.g., OpenLCA[^openlca]), and so on. The functions of graph regionalization with clustering and partitioning have been provided by several packages in R[^R] such as Rgeoda [@rgeoda], maxcut: Max-Cut Problem in sdpt3r [@rahman2018sdpt3r], and RBGL: R Boost Graph Library [@RBGL], and grPartition within MATLAB[^matlab]. They are probably the most closely related projects to the regionalization functionality of **spopt**, however, they are written in R and MATLAB. For facility-location methods, commercial software such as TransCAD and ArcGIS implements models using a heuristic approach. However, they don't provide details about the solution found, which limits the interpretability of the results [@Chen2021]. On the other hand, existing open-source packages mostly aim at solving coverage problems such as pyspatialopt [@pyspatialopt], allagash [@allagash] and maxcovr [@maxcovr], but the available models, solvers, and overall accessibility vary significantly. Therefore, it is necessary to develop an open-source optimization package written in Python that includes various types of classic facility-location methods with a wide range of supported optimization solvers.
Open-source software is another option to access spatial optimization. Although it may require users to have a certain level of programming experience, open-source software provides relatively novel and comprehensive methods, and more importantly, it is free and can be easily replicated. This is particularly true for regionalization and facility-location methods. Regionalization methods are limited in commercial GIS software, and may only have grouping analysis for vector data and region identification for raster data. On the contrary, there are many application-oriented open-source packages that facilitate the implementation of regionalization methods in various fields, including climate (e.g., HiClimR [@badr2015tool], synoptReg [@LEMUSCANOVAS2019114]), biology (e.g., Phyloregion [@daru2020phyloregion], regioneR [@regioneR2015]), hydrology (e.g., nsRFA [@nsRFA]), agriculture (e.g., OpenLCA[^openlca]), and so on. The functions of graph regionalization with clustering and partitioning have been provided by several packages in R[^R] such as Rgeoda [@rgeoda], maxcut: Max-Cut Problem in sdpt3r [@rahman2018sdpt3r], and RBGL: R Boost Graph Library [@RBGL], and grPartition within MATLAB[^matlab]. They are probably the most closely related projects to the regionalization functionality of **spopt**, however, they are written in R and MATLAB. For facility-location methods, commercial software such as ArcGIS and TransCAD implement models using heuristic approaches. However, they don't provide details about the solution found, which limits the interpretability of the results [@Chen2021]. On the other hand, existing open-source packages mostly aim at solving coverage problems such as pyspatialopt [@pyspatialopt], allagash [@allagash] and maxcovr [@maxcovr], but the available models, solvers, and overall accessibility vary significantly. Therefore, it is necessary to develop an open-source optimization package written in Python that includes various types of classic facility-location methods with a wide range of supported optimization solvers.

[^openlca]: https://www.openlca.org
[^R]: https://www.r-project.org
Expand All @@ -74,7 +74,7 @@ Originating from the region module in PySAL, **spopt** is under active developme
3. Region-K-means: K-means clustering for regions with the constraint that each cluster forms a spatially connected component.
4. Automatic Zoning Procedure (AZP): the aggregation of data for a larger number of zones into a prespecified smaller number of regions based on a predefined type of objective function [@openshaw1977geographical;@openshaw1995algorithms].
5. Skater: a constrained spatial regionalization algorithm based on spanning tree pruning. Specifically, the number of edges is prespecified to be cut in a continuous tree to group spatial units into contiguous regions [@assunccao2006efficient].
6. WardSpatial: an agglomerative clustering (each observation starts in its own cluster, and pairs of clusters are chosen to merge at each step) using ward linkage (the goal is to minimize the variance of the clusters) with a spatial connectivity constraint ([sklearn.cluster.AgglomerativeClustering](sklearn.cluster.AgglomerativeClustering: https://scikit-learn.org/stable/modules/generated/sklearn.cluster.AgglomerativeClustering.html)).
6. WardSpatial: an agglomerative clustering (each observation starts in its own cluster, and pairs of clusters are chosen to merge at each step) using ward linkage (the goal is to minimize the variance of the clusters) with a spatial connectivity constraint ([sklearn.cluster.AgglomerativeClustering](https://scikit-learn.org/stable/modules/generated/sklearn.cluster.AgglomerativeClustering.html) [@scikitlearn]).

Take the functionality of Max-p-regions as an example. Other methods can be applied in a similar process, including importing the needed packages, imputing and reading data, defining the parameters, solving the model, and plotting the solution.

Expand Down Expand Up @@ -103,12 +103,12 @@ It results in five regions, three of which have six states, and two with seven s

For facility-location, four models, including two coverage models and two location-allocation models based on median and center problems, are developed using an exact approach with pulp [@mitchell2011pulp] providing an interface to installed solvers.

1. Location Set Covering Problem (LSCP): Finding the minimum number of facilities and their locations such that all demands are covered within the maximal distance or time standard [@Toregas1971].
2. Maximal Covering Location Problem (MCLP): Locating a prespecified number of facilities such that demand coverage within a maximal service distance or time is maximized [@Church1974].
1. Location Set Covering Problem (LSCP): Finding the minimum number of facilities and their locations such that all demands are covered within the maximal threshold of service distance or time [@Toregas1971].
2. Maximal Covering Location Problem (MCLP): Locating a prespecified number of facilities such that demand coverage within a maximal threshold of service distance or time is maximized [@Church1974].
3. P-Median Problem: Locating \textit{p} facilities and allocating the demand served by these facilities so that the total weighted assignment distance or time is minimized [@ReVelle1970].
4. P-Center Problem: Locating \textit{p} facilities and allocating the demand served by these facilities to minimize the maximum assignment distance or time between demands and their allocated facilities [@Hakimi1964].

For example, Maximal Covering Location Model functionality is used to select 4 out of 16 store sites in the San Francisco area to maximize demand coverage, as shown in \autoref{fig: mclp}. Other facility-location methods can be applied in a similar way. Moreover, we have included functionality to place pre-determined site locations within the facility selection pool, allowing for realistic scenarios whereby 1 or more *new* facilities can be added to an *existing* set of locations. We believe this feature of intermingled new and exisitng facility location siting to be the first implementation in open-source optmization software.
For example, Maximal Covering Location Model functionality is used to select 4 out of 16 store sites in the San Francisco area to maximize demand coverage, as shown in \autoref{fig: mclp}. Other facility-location methods can be applied in a similar way. Moreover, we have included functionality to place pre-determined site locations within the facility selection pool, allowing for realistic scenarios whereby 1 or more *new* facilities can be added to an *existing* set of locations. We believe this feature of intermingled new and existing facility location siting to be the first implementation in open-source optimization software.

![The solution of MCLP while siting 4 facilities using 5 kilometers as the maximum service distance between facilities and demand locations. See the "Real World Facility Location" tutorial ([https://pysal.org/spopt/notebooks/facloc-real-world.html](https://pysal.org/spopt/notebooks/facloc-real-world.html)) for more details.\label{fig: mclp}](figs/mclp.png)

Expand All @@ -117,7 +117,7 @@ For example, Maximal Covering Location Model functionality is used to select 4 o
**Spopt** is under active development and the developers look forward to your extensive attention and participation. In the near future, there are three major enhancements we plan to pursue for **spopt**:

1. The first stream will be on the enhancement of regionalization algorithms by including several novel extensions of the classical regionalization models, such as the integration of spatial data uncertainty and the shape of identified regions in the Max-p-regions problem.
2. The second direction involves adding capacity constraints and includes a polygon partial coverage on facility location models. No commercial and open-source software has provided these features before.
2. The second direction involves adding capacity constraints and includes polygon partial coverage for facility location models. No commercial and open-source software has provided these features to date.
3. We anticipate adding functionality for solving traditional routing and transportation-oriented optimization problems. Initially, this will come in the form of integer programming formulations of the Travelling Salesperson Problem [@miller1960integer] and the Transportation Problem [@koopmans1949optimum].

# Acknowledgements
Expand Down

0 comments on commit abc3a76

Please sign in to comment.