Skip to content

Migration of Old pedsnetcdm_to_pcornetcdm into Airflow utilizing spark

Notifications You must be signed in to change notification settings

PEDSnet/airflow_pedsnetcdm_to_pcornetcdm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 

Repository files navigation

airflow_pedsnetcdm_to_pcornetcdm

Migration of Old pedsnetcdm_to_pcornetcdm into Airflow utilizing spark

Currently defined to ETL PEDSnet Version 5.2 into PCORnet Version 6.1

Details of contents:

  • pedsnet_to_pcornet_ETL.py contains the DAG definition and it's corresponding tasks
  • ETL_logic directory contains
    • /data directory containing .csv files used to generate mapping tables in postgres
    • /sql directory containing .sql files executed in the pedsnetcdm_to_pcornetcdm DAG by SQL Operator Tasks
    • /spark directory containing .py PySpark files executed in the pedsnetcdm_to_pcornetcdm DAG by Spark Operator Tasks. Logic was migrated out .sql and into Pyspark to enhance runtime performance

About

Migration of Old pedsnetcdm_to_pcornetcdm into Airflow utilizing spark

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published