Skip to content

Execute OpenRefine JSON scripts without OpenRefine (or Java)

License

Notifications You must be signed in to change notification settings

jezcope/pyrefine

Repository files navigation

PyRefine

image

image

Documentation Status

Updates

OpenRefine is a great tool for exploring and cleaning datasets prior to analysing them. It also records an undo history of all actions that you can export as a sort of script in JSON format. However, in order to execute that script on a new dataset, you need to manually import it through the graphical interface or set up a BatchRefine server, neither of which is quick.

PyRefine allows you to execute OpenRefine JSON scripts against datasets without firing up a full Java/OpenRefine server. It has a commandline tool for quick use, or you can use it as a library to integrate it into your pandas-based data analysis pipeline.

More details in this blog post.

Please note: PyRefine is still very much alpha-quality. It probably doesn't work exactly how you're expecting right now. That said, please try it out, and consider contributing!

Features

  • Execute OpenRefine JSON against a dataset from the command line
  • Execute OpenRefine JSON from a Python script

Credits

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.

About

Execute OpenRefine JSON scripts without OpenRefine (or Java)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published