Python Data Dictionary Writer

Add custom html elements in SchemaSpy pages using Python scripts

Keywords: Python, Web scraping, CSV, HTML, automation

Why did I write this script?

To automate my process of inserting descriptions for multiple tables and fields in the data dictionary.
I also want to optimize my workflow in writing the descriptions for more than a thousand fields by retrieving the unique fields instead.

What can the scripts do?

Add table descriptions dynamically in SchemaSpy index page Original index.html

Resulting index.html with table descriptions for all 139 tables
Add field descriptions dynamically in each SchemaSpy table page Original cfg_billing_id.html

Resulting cfg_billing_id.html with its table description (same with the index) and field descriptions

Prerequisites

Install pip. How to install in mac
Install BeautifulSoup4.

cd beautifulsoup4-4.6.0
python setup.py install

Install Pathlib

sudo pip install pathlib

Add Schemaspy folder to Data folder.
[For Settlement] Result folder should have the folowing subfolders:

settlement
settlement_csv
settlement_table_desc

Releases

Release 1 - Settlement SOW9 (July 2017)

Date: July 13, 2017
Table Count: 139
Field Count: 2,862
Note: Forgot to publish release (Sorry!)

Release 1 Workflow

Complete Prerequisites
Export table list to CSV (c/o Google Sheets)
Update table descriptions of index and table pages using writetabledescriptiontohtml.py
If no fields yet:
- Retrieve and save all unique fields to CSV using retrieveuniquefields.py
- Write fields and field descriptions to CSV using writefieldstocsv.py
If field descriptions are complete in CSV: Update field descriptions of all tables using writefielddescriptionstohtml.py

Release 2 - Settlement SOW10 (October 2017)

Date: October 16, 2017
Table Count: 186
Field Count: 4,066
Note: Applied web scrapping to new tables and fields

Release 2 Workflow

Complete Prerequisites
Export table list with descriptions to CSV (c/o Google Sheets)
Update table descriptions of index and table pages using writeTableDescriptiontoHTML.py
If no more fields yet:
- Retain SOW9 unique fields unique_fields-sow9.csv
- Retrieve all new SOW10 fields to CSV using retrievenewfields.py
- Write SOW9 and SOW10 fields to CSV using writeFieldsToCSV.py
If field descriptions are complete in CSV: Update field descriptions of all tables using writeFieldDescriptionsToHTML.py

TL;DR version: How to use these scripts?

Export table masterlist with descriptions to csv (Google Sheets).
- Default Directory: ../../Google Drive/Python/CSV_dump/Settlement-Tables-Descriptions.csv
Run writetabledescriptiontohtml.py
- Write table description to each table html page
- Result: Result/settlement_tables_desc/tables/
Run retrieveuniquefields.py
- Retrieve all common and unique fields from all table html pages. Save to CSV
- Result: Result/settlement_csv/unique_fields.csv
Update unique_fields.csv
- User can add description to all unique fields in just one CSV file
Run writefieldstocsv.py
- Retrieve fields from table html. Add descriptions of common and unique fields from unique_fields.csv
- Result: Result/settlement_csv/*
(Optional) Update Result/settlement_csv/* csv files
- User can modify descriptions for specific table CSV files
Run writefielddescriptionstohtml.py
- HTML Source: Result/settlement_tables_desc/tables/
- Content Source: Result/settlement_csv/*
- Write field descriptions from table csv to each table html.
- Result: Result/settlement/tables

Things I learned from this mini project:

Google Apps Scripts

Export Tables Masterlist to CSV
Script not in this repository

Python

Read html files from SchemaSpy folder (BeautifulSoup)
Retrieve select items from html pages (BeautifulSoup)
Modify html tag attributes (BeautifulSoup)
Read CSV files
Write CSV files
Write HTML in HTML Files based on CSV content (Pathlib)

Project Logs

Check out my logs!

HTML Customizations:

Done - Line 92: Add the following tag

<!----Table Description---->
<br>
<div><strong>Description: </strong> {Insert description here from csv source}</div>
<br>
<!----Table Description---->

Done - Line 40: Add checked for comments

<label for='showComments'><input type=checkbox checked id='showComments'>Comments</label>

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
Result		Result
Screenshots		Screenshots
beautifulsoup4-4.6.0		beautifulsoup4-4.6.0
.gitignore		.gitignore
Logs.md		Logs.md
README.md		README.md
directorynavigator.py		directorynavigator.py
opencsv.py		opencsv.py
retrievefields.py		retrievefields.py
retrievenewfields.py		retrievenewfields.py
retrieveuniquefields.py		retrieveuniquefields.py
writeTableDescriptionToIndex.py		writeTableDescriptionToIndex.py
writefielddescriptionstohtml.py		writefielddescriptionstohtml.py
writefieldstocsv.py		writefieldstocsv.py
writetabledescriptiontohtml.py		writetabledescriptiontohtml.py

eyana-m/python-data-dictionary-writer

Folders and files

Latest commit

History

Repository files navigation

Python Data Dictionary Writer

Why did I write this script?

What can the scripts do?

Prerequisites

Releases

Release 1 - Settlement SOW9 (July 2017)

Release 1 Workflow

Release 2 - Settlement SOW10 (October 2017)

Release 2 Workflow

TL;DR version: How to use these scripts?

Things I learned from this mini project:

Project Logs

HTML Customizations:

Resources:

Libraries used:

Stackoverflow resources:

About

Topics

Resources

Stars

Watchers

Forks

Languages