Skip to content

eyana-m/python-data-dictionary-writer

Repository files navigation

Python Data Dictionary Writer

Add custom html elements in SchemaSpy pages using Python scripts

Keywords: Python, Web scraping, CSV, HTML, automation

Why did I write this script?

  • To automate my process of inserting descriptions for multiple tables and fields in the data dictionary.
  • I also want to optimize my workflow in writing the descriptions for more than a thousand fields by retrieving the unique fields instead.

What can the scripts do?

  1. Add table descriptions dynamically in SchemaSpy index page Index Before Original index.html

    Index After Resulting index.html with table descriptions for all 139 tables

  2. Add field descriptions dynamically in each SchemaSpy table page Table Before Original cfg_billing_id.html

    Table After Resulting cfg_billing_id.html with its table description (same with the index) and field descriptions

Prerequisites

  1. Install pip. How to install in mac

  2. Install BeautifulSoup4.

cd beautifulsoup4-4.6.0
python setup.py install 
  1. Install Pathlib
sudo pip install pathlib
  1. Add Schemaspy folder to Data folder.

  2. [For Settlement] Result folder should have the folowing subfolders:

  • settlement
  • settlement_csv
  • settlement_table_desc

Releases

Release 1 - Settlement SOW9 (July 2017)

Release 1

  • Date: July 13, 2017
  • Table Count: 139
  • Field Count: 2,862
  • Note: Forgot to publish release (Sorry!)
Release 1 Workflow
  1. Complete Prerequisites
  2. Export table list to CSV (c/o Google Sheets)
  3. Update table descriptions of index and table pages using writetabledescriptiontohtml.py
  4. If no fields yet:
    • Retrieve and save all unique fields to CSV using retrieveuniquefields.py
    • Write fields and field descriptions to CSV using writefieldstocsv.py
  5. If field descriptions are complete in CSV: Update field descriptions of all tables using writefielddescriptionstohtml.py

Release 2 - Settlement SOW10 (October 2017)

Release 2

  • Date: October 16, 2017
  • Table Count: 186
  • Field Count: 4,066
  • Note: Applied web scrapping to new tables and fields
Release 2 Workflow
  1. Complete Prerequisites
  2. Export table list with descriptions to CSV (c/o Google Sheets)
  3. Update table descriptions of index and table pages using writeTableDescriptiontoHTML.py
  4. If no more fields yet:
    • Retain SOW9 unique fields unique_fields-sow9.csv
    • Retrieve all new SOW10 fields to CSV using retrievenewfields.py
    • Write SOW9 and SOW10 fields to CSV using writeFieldsToCSV.py
  5. If field descriptions are complete in CSV: Update field descriptions of all tables using writeFieldDescriptionsToHTML.py

TL;DR version: How to use these scripts?

  1. Export table masterlist with descriptions to csv (Google Sheets).

    • Default Directory: ../../Google Drive/Python/CSV_dump/Settlement-Tables-Descriptions.csv
  2. Run writetabledescriptiontohtml.py

    • Write table description to each table html page
    • Result: Result/settlement_tables_desc/tables/
  3. Run retrieveuniquefields.py

    • Retrieve all common and unique fields from all table html pages. Save to CSV
    • Result: Result/settlement_csv/unique_fields.csv
  4. Update unique_fields.csv

    • User can add description to all unique fields in just one CSV file
  5. Run writefieldstocsv.py

    • Retrieve fields from table html. Add descriptions of common and unique fields from unique_fields.csv
    • Result: Result/settlement_csv/*
  6. (Optional) Update Result/settlement_csv/* csv files

    • User can modify descriptions for specific table CSV files
  7. Run writefielddescriptionstohtml.py

    • HTML Source: Result/settlement_tables_desc/tables/
    • Content Source: Result/settlement_csv/*
    • Write field descriptions from table csv to each table html.
    • Result: Result/settlement/tables

Things I learned from this mini project:

  1. Google Apps Scripts
  • Export Tables Masterlist to CSV
  • Script not in this repository
  1. Python
  • Read html files from SchemaSpy folder (BeautifulSoup)
  • Retrieve select items from html pages (BeautifulSoup)
  • Modify html tag attributes (BeautifulSoup)
  • Read CSV files
  • Write CSV files
  • Write HTML in HTML Files based on CSV content (Pathlib)

Project Logs

Check out my logs!

HTML Customizations:

Done - Line 92: Add the following tag

<!----Table Description---->
<br>
<div><strong>Description: </strong> {Insert description here from csv source}</div>
<br>
<!----Table Description---->

Done - Line 40: Add checked for comments

<label for='showComments'><input type=checkbox checked id='showComments'>Comments</label>

Resources:

Libraries used:

  1. BeautifulSoup4
  2. Pathlib

Stackoverflow resources:

  1. Inserting HTML String into BS4 Object

  2. Getting the filename dynamically

  3. Looping directory using python

  4. Writing CSV using Python

  5. Finding html element by class

  6. Writing to BS4 Find object

  7. Python: How to check if cell in csv file is empty

  8. How to install pip in mac

  9. How can I remove DS store files from a git repository