Skip to content

Codebase underlying the emergency rental assistance prioritization index version 2.0.

Notifications You must be signed in to change notification settings

UrbanInstitute/emergency-rental-assistance-priority-index

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Emergency Rental Assistance Priority (ERAP) Index 2.0

This repository contains the data and code necessary to generate the census tract-level Emergency Rental Assistance Priority Index that powers this interactive Urban feature.

The code for the front-end interactive feature is available here. The final data can be downloaded from the project's Data Catalog page. For additional details on the development of the Index, refer to the project's Technical Report.

The ERAP Index 2.0 is composed of three subindices: the Housing Subindex, the Household Demographics Subindex, and the Income Subindex. Each of the subindices contains multiple indicators; when combined, they produce an overarching Index score that reflects the tract-level need for emergency rental assistance. The indicators and the corresponding data sources are listed below.

Indicators

Housing Subindex

  • Share of renter-occupied housing units: share of all occupied units that are occupied by renters
  • Share of renter-occupied housing units in multi-unit buildings: share of all renter-occupied units that are in structures with more than one unit
  • Median monthly housing cost: median monthly housing cost of all occupied housing units with monthly housing costs

Household Demographics Subindex

  • Average renter household size: number of people in renter households divided by the number of renter households
  • Share of Black individuals: share of all individuals that identify as Black and do not identify as Hispanic or Latino
  • Share of Asian individuals: share of all individuals that identify as Asian and do not identify as Hispanic or Latino
  • Share of Latine individuals: share of all individuals that identify as Hispanic or Latino
  • Share of Indigenous, Pacific Islander, or multiracial individuals: share of all individuals that identify as Indigenous, Pacific Islander, or multiracial and do not identify as Hispanic or Latino

Income Subindex

  • Share of cost-burdened renter households: share of renter households with incomes of less than $35,000 that are paying 50 percent or more of their incomes on rent
  • Share of extremely low-income renter households: share of all renter households with incomes at or below 30 percent of the HUD area median family income

Scripts

  • 001_generate_full_index.qmd: this composes the below scripts to generate the full ERAP Index.

  • generate_unweighted_indicators.R: this compiles raw data from various sources into a single dataframe of indicators.

  • list_census_index_vars.R: this returns a character vector of American Community Survey (ACS) variable names.

  • get_census_index_vars.R: this downloads, formats, and calculates the indicators described above, (with the exception of share of extremely low-income renter households, which is derived from CHAS data described below) using data from the American Community Survey.

  • get_eviction_vars.R: this downloads, formats, and returns 2018 eviction filing counts by census tract provided by the Evictions Lab at Princeton University.

  • get_chas_index_vars.R: this downloads, formats, and returns the number of renters and the number of renters whose income is less than or equal to 30% of the HUD Area Median Family Income (HAMFI) for each census tract from HUD's CHAS (Comprehensive Housing Affordability Strategy) dataset.

  • impute_2010_2020_tracts_areal.R: this attributes data at the 2010-Census-tract level to the 2020-Census-tract level via area-based interpolation based on data provided by the Census Bureau. Note that field names do not have formal definitions from Census. We provide descriptions of the fields used for imputation below:

    • GEOID_TRACT_20: the 2020 tract GEOID.
    • GEOID_TRACT_10: the 2010 tract GEOID.
    • AREALAND_TRACT_20: the area of the 2020 tract geography that is land (as opposed to water).
    • AREALAND_TRACT_10: the area of the 2010 tract geography that is land (as opposed to water).
    • AREALAND_PART: the land area of the 2020 tract geography that falls within the intersecting 2010 tract. alternately, the land area of the 2010 tract geography that falls within the intersecting 2020 tract.
    • perc_2010tractlandarea_in_2020tractlandarea: the land area of the intersection between a given 2010 tract and 2020 tract divided by the 2010 tract land area.
  • get_wqs_scores.R: this formats the index to include only our selected indicators, groups the indicators into their associated subindices, and calculates the weights of each subindex to maximize the correlation of the overall index score with evictions. This also calculates the z-score and percentile ranking for each of our indicators and each subindex.

Created Datasets and Caching

  • When the write_cache parameter (used across multiple scripts) is set to TRUE, the function will write a copy of the data to a local folder in the repository at data/intermediate-data for convenience for future use.

  • When the read_cache parameter (used across multiple scripts) is set to TRUE, the function will first attempt to read a local copy of the data (if it exists) before pulling the data from a remote location in order to save time.

Rerunning the Index with Data from Different Years

  • Whether you are updating the index with more current data or producing the index for a previous year, the following steps outline the process of producing the ERAP Index for different years:
    • Create a new branch on the public-facing repository (named, e.g., "[YEAR]-updates”)
    • In /scripts/001_generate_full_index.qmd
      • Update the dataset years (for ACS and CHAS data) in the second chunk (named parameters)
      • Ensure the cacheing parameters are set such that no data are read from the cache (read_cache_all = FALSE) and all data are written to the cache (write_cache_all = TRUE)
      • Run all chunks
    • The updated datasets will be cached in a local directory within the repository. Read these datasets (the ACS and CHAS data) in and check for, at a minimum, the following:
      • Records accurately join to 2020-vintage census tracts or whatever vintage is the most current for your data
      • There is limited or no missingness in fields of interest
      • For ACS data, ensure that variable names or construction have not changed over time; variable alignment can be checked using: https://www2.census.gov/data/api-documentation/2022-5yr-api-changes.csv (or the equivalent for the given year)
    • Re-run the correlations between all index indicators and the evictions data (2018 vintage, as of writing in 2024); correlations should be fairly similar (if not, take pause)
    • Quality-check the final outputted datasets
      • Run skimr::skim() and look at minima and maxima values for all variables, as well as rates of missingness (which should be zero or very low for all variables)
      • Check distributions of any z-scored percentiles, should be a flat distribution from 1-100
      • Check that the number of records in each dataset is as anticipated (1 record per 2020-vintage tracts, roughly)
  • Potential errors/issues when updating the ERAP Index with different years
    • The generate_unweighted_indicators.R script may try to call for ACS or CHAS years that are unavailable. Double check that the set parameters for census_year and chas_year are years for which each respective source has data available
    • CHAS data and ACS data across different years may use differing prefixes for each census tracts GEOID, this may cause errors when joining the two dataset together. Ensure that the different GEOID prefixes (i.e. "14000US" vs "14000000US") are accounted for in the generate_unweighted_indicators.R and removed appropriately before joining.
    • With the updated 2016-2020 CHAS data, the areal imputation from 2010 census tract vintages to 2020 tract vintages is no longer needed as now both the ACS and CHAS using 2020 vintages
      • Ensure that the impute_2010_2020_tracts_areal() script is not being run on either dataset, unless you are intentionally running the scripts with pre-2020 data
    • The gWQS package may change the weighting of each index based on which version of the package is installed — note that lockfile is the source of truth for exact reproducibility

Questions?

Reach out to wcurrangroome@urban.org.

About

Codebase underlying the emergency rental assistance prioritization index version 2.0.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages