Alex Reppel 15 February, 2021
The aim of this tutorial1 is to demonstrate how easy it is to create interactive maps in R using the Leaflet JavaScript framework The link between R and Leaflet is provided by Leaflet for R.
Police forces in the UK make available a wide variety of policing-related data.
Data for England, Wales, and Northern Ireland is published at data.police.uk under an Open Government Licence v3.0.
The available data on data.police.uk is
separated by year and police force. The first step is to download data
files for 2020 for a number of police forces across England and Wales
and to store them in a directory (data/
) where R can find them.
Because all of these files end with “-street.csv”, the Sys.glob()
function can be used to locate all files matching this pattern and
store the result in the variable data_files
. (The *
place holder
means “any combination of letters”.)
data_files <- Sys.glob(
"data/*/*-street.csv")
Each file contains data we want to combine into a single data file. We
do that using the ldply()
function from the tidyverse()
package. The
result is an R data
frame
stored in the variable data
. (A data frame is similar to a matrix
with rows and columns.)
data <- ldply(
data_files,
read_csv)
Once every data file is read and combined, the resulting data frame
(called data
) contains a total of 5,162,703 records (or Rows
).
The glimpse()
function allows us to quickly, well, glimpse into the
data frame.
## Rows: 5,162,703
## Columns: 12
## $ `Crime ID` <chr> NA, "be92f870c8b560fccc562407cc3537c7e0a2df76c…
## $ Month <chr> "2020-01", "2020-01", "2020-01", "2020-01", "2…
## $ `Reported by` <chr> "Avon and Somerset Constabulary", "Avon and So…
## $ `Falls within` <chr> "Avon and Somerset Constabulary", "Avon and So…
## $ Longitude <dbl> -2.509384, -2.494870, -2.494870, -2.494870, -2…
## $ Latitude <dbl> 51.40959, 51.42228, 51.42228, 51.42228, 51.422…
## $ Location <chr> "On or near Barnard Walk", "On or near Confere…
## $ `LSOA code` <chr> "E01014399", "E01014399", "E01014399", "E01014…
## $ `LSOA name` <chr> "Bath and North East Somerset 001A", "Bath and…
## $ `Crime type` <chr> "Anti-social behaviour", "Burglary", "Burglary…
## $ `Last outcome category` <chr> NA, "Status update unavailable", "Status updat…
## $ Context <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
Our data frame consists of 12 columns:
colnames(data)
## [1] "Crime ID" "Month" "Reported by"
## [4] "Falls within" "Longitude" "Latitude"
## [7] "Location" "LSOA code" "LSOA name"
## [10] "Crime type" "Last outcome category" "Context"
It may seem inefficient to first read all data into a single data-frame, only to remove several columns. While this isn’t the most efficient way of working with a large number of data files, it is my preferred way to explore a dataset and to decide which columns to keep.
For this exercise, we keep five columns:
- Month
- Longitude
- Latitude
- Crime type
- Last outcome category
The columns to keep from the original data frame are stored in the
columns
variable and the new labels for those columns are stored in
the column_names
variable.
columns <- c('Month', 'Longitude', 'Latitude', 'Crime type', 'Last outcome category')
column_names <- c('DATE', 'LNG', 'LAT', 'CRIME', 'OUTCOME')
data <- data %>%
select(all_of(columns))
colnames(data) <- column_names
The Last outcome category
stores one of the standard outcome
categories used by the Police. It is the most recent known outcome for
an incident at the time the data was downloaded from
data.police.uk.
An important manual step is to classify the standard outcome labels used by the Police.
For this exercise, we collate them into four categories:
- Investigation closed incidents are no longer pursued but remain unresolved
- Investigation concluded incidents resulted in some sort of resolution
- Investigation ongoing describes incidents still under investigation
- Current status unavailable describes incidents where no results are available
Here, these categories are referred to as status
and will be used to
format the legend of the graph, as well as the colour scheme used to
highlight individual incidents on the map later on.
# Example:
# The status "Investigation complete; no suspect identified" is used by the Police
# and is merged here into the variable "STATUS_CLOSED". (Together with others, such
# as "Unable to prosecute suspect" for example.)
STATUS_CLOSED <- c(
"Investigation complete; no suspect identified",
"Unable to prosecute suspect",
"Formal action is not in the public interest",
"Further action is not in the public interest",
"Further investigation is not in the public interest"
)
STATUS_CONCLUDED <- c(
"Action to be taken by another organisation",
"Awaiting court outcome",
"Local resolution",
"Offender given a caution",
"Offender given a drugs possession warning",
"Offender given community sentence",
"Offender given penalty notice",
"Suspect charged as part of another case"
)
STATUS_ONGOING <- c(
"Under investigation"
)
STATUS_UNCLEAR <- c(
"Court result unavailable",
"Status update unavailable"
)
data$STATUS <- NA
assign_outcome_status <- function(x) {
if (is.na(x)) {
return('Current status unavailable')
} else if (x %in% STATUS_CLOSED) {
return('Investigation closed')
} else if (x %in% STATUS_CONCLUDED) {
return('Investigation concluded')
} else if (x %in% STATUS_ONGOING) {
return('Investigation ongoing')
} else {
return('Current status unavailable')
}
}
data$STATUS <- unlist(
lapply(
data$OUTCOME,
FUN = assign_outcome_status
)
)
The year and month each incident took place is recorded and stored in
the Date
variable (or DATE
as we have renamed it in our dataset). It
may be useful to further separate DATE
into YEAR
and MONTH
by
creating two more columns.
data <- data %>%
separate(DATE, c('YEAR', 'MONTH'), '-')
data$YEAR <- as.integer(data$YEAR)
data$MONTH <- as.integer(data$MONTH)
A couple of additional preparatory steps are required before we can generate our map.
First, we exclude incidents in the dataset that do not contain either
Latitude
or Longitude
coordinates.
data <- filter(
data,
is.na(LNG) == FALSE,
is.na(LAT) == FALSE,
)
Second, the crime
variable in the dataset includes a total of 14
unique categories, ranging from “Anti-social behaviour” and “Bicycle
theft” to “Robbery” and other serious crimes.
For this exercise, we want to visualise a single category and also limit the time-frame from which we collect incidents.
WHICH_YEAR <- 2020
WHICH_MONTH_FROM <- 1
WHICH_MONTH_TO <- 6
WHICH_CRIME <- 'Bicycle theft'
We are interested in Bicycle theft across England and Wales. To limit the number of instances in our interactive map to a more manageable size, we chose only incidents between January and June 2020.
We can quickly confirm this by looking at the first few entries (using
the head()
function) of our dataset showing only records matching the
Bicycle theft crime category.
The separation of DATE
into YEAR
and MONTH
(see above) helps us
limit the time-frame of incidents we want to map.
map_data <- filter(
data,
YEAR == WHICH_YEAR & MONTH >= WHICH_MONTH_FROM & MONTH <= WHICH_MONTH_TO,
CRIME == WHICH_CRIME
)
The filter
operation has reduced the number of incidents considerably;
down from 5,071,884 to 30,325.
## Rows: 30,325
## Columns: 7
## $ YEAR <int> 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 20…
## $ MONTH <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ LNG <dbl> -2.502878, -2.391092, -2.358051, -2.362118, -2.368555, -2.3258…
## $ LAT <dbl> 51.40628, 51.39719, 51.37753, 51.38486, 51.38119, 51.35826, 51…
## $ CRIME <chr> "Bicycle theft", "Bicycle theft", "Bicycle theft", "Bicycle th…
## $ OUTCOME <chr> "Status update unavailable", "Status update unavailable", "Sta…
## $ STATUS <chr> "Current status unavailable", "Current status unavailable", "C…
Now we have all the data we need to create our map.
We begin with a basemap
that defines the boundaries of the visible
area. For this, we use the minumum and maximum coordinates in our
dataset.
basemap <- leaflet(map_data) %>%
fitBounds(
min(map_data$LNG),
min(map_data$LAT),
max(map_data$LNG),
max(map_data$LAT))
Before we map our data, we have to decide on how our map should look
like. For this, Leaflet provides a wide variety
of
Tiles
that can be added to the basemap
.
A fun example are Stamen’s watercolour tiles:
watercolour_map <- basemap %>%
addProviderTiles(
providers$Stamen.Watercolor)
Although these look beautiful, they lack some detail we are interested in for this example. A possibly more appropriate version is the default Tile format:
basemap <- leaflet(map_data) %>%
addProviderTiles(
providers$CartoDB.Positron) %>%
fitBounds(
min(map_data$LNG),
min(map_data$LAT),
max(map_data$LNG),
max(map_data$LAT))
The next step is to create labels and to specify a colour palette
for the different outcomes (which we stored earlier in the STATUS
column).
The following two code snippets (a) add a count in the form of (n=??)
and (b) assign a unique colour to each outcome category.
format_labels <- function(n) {
return(
paste(
n,
' (n=',
format(
as.integer(LABELS[n]),
big.mark=','),
')',
sep = ''
)
)
}
LABELS <- sort(table(map_data$STATUS))
LABELS <- lapply(names(LABELS), format_labels)
# based on the ColorBrewer 'PRGn' palette.
map_palette <- c(
'#bdbdbd', # grey
'#a6cee3', # light blue
'#33a02c', # green
'#1f78b4' # blue
)
map_palette <- map_palette[1:length(LABELS)]
set_colour <- colorFactor(
map_palette,
domain = NULL
)
Leaflet also allows to customise the markers used to indicate where an item has been placed on the map, or rather where an incident took place. We use circles with an opacity level of 35% to ensure that areas with a large number of incidents are shown in more intense colours (because these are overlapping, thus decreasing opacity).
map <- basemap %>%
addCircleMarkers(
~LNG,
~LAT,
stroke = FALSE,
fillOpacity = .35,
color = ~set_colour(
STATUS),
radius = ~ifelse(
STATUS %in% c(
'Investigation concluded',
'Awaiting court outcome',
'Investigation ongoing'
), 6, 2)
)
We can now print the legend of different outcomes that we prepared
earlier. The legend we use for this example essentially shows a status
summary of investigations.
map <- map %>%
addLegend(
'topright',
colors = map_palette,
labels = sort(unlist(LABELS)),
title = paste(
unique(map_data$CRIME),
's: ',
format(count(map_data), big.mark=','),
' (', month.name[min(map_data$MONTH)],
' to ', month.name[max(map_data$MONTH)],
' ', min(map_data$YEAR),
')',
sep = ''
)
)
Although we have already added marker for each reported incident onto the map, it may be beneficial to add this information twice, once as individual circles and once again as part of a cluster.
This is potentially misleading as each incident appears twice on the map. But it has its advantages as it is now much easier to see a more nuanced view of incident clusters: Once as actual cluster (represented by the large circles showing the number of incidents each of them represents) and once again distributed across the map (represented by small circles, some of them overlapping).
cluster_map <- map %>%
addMarkers(
~LNG,
~LAT,
popup = paste(
'<h3 style="color:', set_colour(map_data$STATUS),
';text-align:center;">', toupper(map_data$STATUS),'</h3>',
'<hr /><strong>Crime:</strong>', map_data$CRIME, '<br />',
'<strong>Status:</strong>', map_data$OUTCOME, '<br /><hr />',
'<p style="text-align:right;"><i>', month.name[map_data$MONTH], map_data$YEAR, '</i></p>'
),
clusterOptions = markerClusterOptions()
)
(The interactive version of this map is not viewable on GitHub; knit the .Rmd script directly to HTML.)
Interactive data maps produced with Leaflet for R. Data from data.police.uk published under the Open Government Licence v3.0.
- Andrew Ba Tran’s (2018) “Interactive Maps with Leaflet” is a short and very accessible tutorial. It is part of a chapter on “Spatial analysis.”
- Chapter 3 of Claudia Engel’s (2019) book “Using Spatial Data with R” shows a number of ways of how to make maps in R.
- Carson Farmer & Leah Wasser’s (2019) “Creating Interactive Spatial Maps in R Using Leaflet” tutorial demonstrates alternative ways of formatting markers on interactive spatial maps.
- Kasia Kulma’s (2018) “Exploring London Crime with R heat maps” reads data via the data.police.uk API and combines it with scraped data from Wikipedia to create a variety of heat maps, including geographic heat maps for London.
- The tmap package to create “thematic maps in R” (Tennekes 2018).
- Andy Woodruff, Ryan Mullins, and Cristen Jones Leaflet tutorial for Maptime Boston.
- Conlen, Matthew, John Keefe, Lauren Leatherby and Charlie Smart (2020) “How Full Are Hospital I.C.U.s Near You?, The New York Times, Dec. 15, 2020 [accessed Dec. 16, 2020.]
Tennekes, Martijn. 2018. “Tmap: Thematic Maps in R.” Journal of Statistical Software, Articles 84 (6): 1–39. https://doi.org/10.18637/jss.v084.i06.