datopy: data management tools for Python

datopy (da-toh-pie) is a Python library for people who work with unstructured data, providing a simple workflow for building data models and ETL pipelines.

This package also includes utilities for:

Data retrieval
Input/Output
Jupyter notebook workflows

Note

This project is under active development.

Getting Started

Installation

To use datopy, first install it using pip:

$ pip install "git+https://github.com/bainmatt/datopy.git#egg=datopy"

Cloning

Step 1. Clone the repo:

$ git clone https://github.com/bainmatt/datopy.git
$ cd datopy

Step 2. Install dependencies:

$ conda env create -f environment.yml
$ conda activate dato-py

Development

TODO

Usage

Dataset inspection (`datopy.inspection`)

Produce multiple parallel, informative displays of Pandas data frames and NumPy arrays for data exploration and inspection.

>>> import numpy as np
>>> import pandas as pd
>>> from datopy.inspection import display, make_df

>>> df1 = make_df('AB', [1, 2]); df2 = make_df('AB', [3, 4])
>>> display('df1', 'df2', 'pd.concat([df1, df2])', globs=globals(), bold=False)

df1
--- (2, 2) ---
   A   B
1  A1  B1
2  A2  B2


df2
--- (2, 2) ---
   A   B
3  A3  B3
4  A4  B4


pd.concat([df1, df2])
--- (4, 2) ---
   A   B
1  A1  B1
2  A2  B2
3  A3  B3
4  A4  B4

Metadata scraping (`datopy._media_scrape`)

TODO

Retrieve media-related data from Spotify, IMDb, and Wikipedia.

Acknowledgements

datopy is powered by:

mypy type checking

pytest unit testing

Flake8 linting

Sphinx documentation

numpydoc docstrings

PyData theming

Read the Docs hosting

GitHub Actions continuous integration

PyPI packaging

Pydantic data validation

License

This project is licensed under the MIT License.

Contact

Project Link: https://github.com/bainmatt/datopy

Name		Name	Last commit message	Last commit date
Latest commit History 128 Commits
.github		.github
docs		docs
src/datopy		src/datopy
tests		tests
.gitignore		.gitignore
.pre-commit-config.yml		.pre-commit-config.yml
.readthedocs.yml		.readthedocs.yml
CHANGELOG.rst		CHANGELOG.rst
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.rst		README.rst
_README.md		_README.md
environment.yml		environment.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
requirements_dev.txt		requirements_dev.txt
requirements_docs.txt		requirements_docs.txt
requirements_optional.txt		requirements_optional.txt
setup.cfg		setup.cfg
setup.py		setup.py
tox.ini		tox.ini

License

bainmatt/datopy

Folders and files

Latest commit

History

Repository files navigation

datopy: data management tools for Python

Getting Started

Installation

Cloning

Development

Usage

Dataset inspection (datopy.inspection)

Metadata scraping (datopy._media_scrape)

Acknowledgements

License

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Languages

Dataset inspection (`datopy.inspection`)

Metadata scraping (`datopy._media_scrape`)