Skip to content

This project extracts data from Azure datalake gen 2 storage, transforming it and then transferring it to SQL database.

Notifications You must be signed in to change notification settings

MuhammadHasaanWahid/Data-Cleaning-Pipeline-ETL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 

Repository files navigation

Data-Cleaning-Pipeline-ETL

Click here to see the dataset.

Problem Statement

The problem was to extract data from two csv files with the records of 40k plus rows from the azure datalake gen2 storage then combine those two files with the the SQL join to create a single table then perform some cleaning like removing null values, unnecessary columns and then transfer it to azure SQL database.

The Json files

The First Portfolio Project.json file contains information about the ADF pipeline, including the pipeline name, description, and the resources that make up the pipeline. The manifest.json file contains information about the dependencies and structure of the ARM template of the pipeline in Azure DataFactory.

Workflow

Untitled Diagram drawio

Pipeline Structure

Capture

Data at the Destination (SQL database)

Capture2

About

This project extracts data from Azure datalake gen 2 storage, transforming it and then transferring it to SQL database.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published