Skip to content

markvanderloo/simputation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CRANstatus DownloadsMentioned in Awesome Official Statistics

simputation

An R package to make imputation simple. Currently supported methods include

  • Model based (optionally add [non-]parametric random residual)
    • linear regression
    • robust linear regression (M-estimation)
    • ridge/elasticnet/lasso regression (from version >= 0.2.1)
    • CART models
    • Random forest
  • Model based, multivariate
    • Imputation based on EM-estimated parameters (from version >= 0.2.1)
    • missForest (from version >= 0.2.1)
  • Donor imputation (including various donor pool specifications)
    • k-nearest neigbour (based on gower's distance)
    • sequential hotdeck (LOCF, NOCB)
    • random hotdeck
    • Predictive mean matching
  • Other
    • (groupwise) median imputation (optional random residual)
    • Proxy imputation (copy from other variable)

Installation

To install simputation and all packages needed to support various imputation models do the following.

install.packages("simputation", dependencies=TRUE)

To install the development version.

git clone https://github.com/markvanderloo/simputation
make install

Example usage

Create some data suffering from missings

library(simputation) # current package

dat <- iris
# empty a few fields
dat[1:3,1] <- dat[3:7,2] <- dat[8:10,5] <- NA
head(dat,10)

Now impute Sepal.Length and Sepal.Width by regression on Petal.Length and Species, and impute Species using a CART model, that uses all other variables (including the imputed variables in this case).

dat |>
  impute_lm(Sepal.Length + Sepal.Width ~ Petal.Length + Species) |>
  impute_cart(Species ~ .) |> # use all variables except 'Species' as predictor
  head(10)

Materials