Skip to content

Generate Pandas frames, load and extract data, based on JSON Table Schema descriptors.

License

Notifications You must be signed in to change notification settings

datadotworld/jsontableschema-pandas-py

 
 

Repository files navigation

jsontableschema-pandas

Travis Coveralls PyPi SemVer Gitter

Generate and load Pandas data frames based on JSON Table Schema descriptors.

Version v0.2 contains breaking changes:

  • removed Storage(prefix=) argument (was a stub)
  • renamed Storage(tables=) to Storage(dataframes=)
  • renamed Storage.tables to Storage.buckets
  • changed Storage.read to read into memory
  • added Storage.iter to yield row by row

Getting Started

Installation

$ pip install datapackage
$ pip install jsontableschema-pandas

Example

You can easily load resources from a data package as Pandas data frames by simply using datapackage.push_datapackage function:

>>> import datapackage

>>> data_url = 'http://data.okfn.org/data/core/country-list/datapackage.json'
>>> storage = datapackage.push_datapackage(data_url, 'pandas')

>>> storage.buckets
['data___data']

>>> type(storage['data___data'])
<class 'pandas.core.frame.DataFrame'>

>>> storage['data___data'].head()
             Name Code
0     Afghanistan   AF
1   Åland Islands   AX
2         Albania   AL
3         Algeria   DZ
4  American Samoa   AS

Also it is possible to pull your existing data frame into a data package:

>>> datapackage.pull_datapackage('/tmp/datapackage.json', 'country_list', 'pandas', tables={
...     'data': storage['data___data'],
... })
Storage

Storage

Package implements Tabular Storage interface.

We can get storage this way:

>>> from jsontableschema_pandas import Storage

>>> storage = Storage()

Storage works as a container for Pandas data frames. You can define new data frame inside storage using storage.create method:

>>> storage.create('data', {
...     'primaryKey': 'id',
...     'fields': [
...         {'name': 'id', 'type': 'integer'},
...         {'name': 'comment', 'type': 'string'},
...     ]
... })

>>> storage.buckets
['data']

>>> storage['data'].shape
(0, 0)

Use storage.write to populate data frame with data:

>>> storage.write('data', [(1, 'a'), (2, 'b')])

>>> storage['data']
id comment
1        a
2        b

Also you can use tabulator to populate data frame from external data file:

>>> import tabulator

>>> with tabulator.Stream('data/comments.csv', headers=1) as stream:
...     storage.write('data', stream)

>>> storage['data']
id comment
1        a
2        b
1     good

As you see, subsequent writes simply appends new data on top of existing ones.

API Reference

Snapshot

https://github.com/frictionlessdata/jsontableschema-py#snapshot

Detailed

Contributing

Please read the contribution guideline:

How to Contribute

Thanks!

About

Generate Pandas frames, load and extract data, based on JSON Table Schema descriptors.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 97.7%
  • Makefile 2.3%