Skip to content


Repository files navigation

nihmporter is Python software to download and pack the data published by the National Institute of Health.


You can use to build a proper Anaconda environment (by default, named nih), or inspect it to see the exact requirements.


Activate the above environment and run

# after activating the appropriate conda environement

It should result in some feather/pickle (as of July 2021, huge feather files cause memory issues) files, each one storing a Pandas DataFrame. In any one of them, the same record might (most likely will) show up more than once since, until its final release, the information of a contract is updated in different files (which stitches together) at successive dates. For more details see the About section.

The script also produces a bunch of csv files which subset the above feather/pickle files into some data exploited by the (extra) utiliy


If the script is re-run (in the same directory), many already existing files will be reused (i.e., not downloaded again). In particular, whenever the program is about to download some zip file, it will only do so if it is not already present, or if the homonymous file in the server is more recent (in which case the local file will be overwritten).

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 101004870. H2020-SC6-GOVERNANCE-2018-2019-2020 / H2020-SC6-GOVERNANCE-2020