💜🌈📊 A Data Engineering Project that implements an ETL data pipeline using Dagster, Apache Spark, Streamlit, MinIO, Metabase, Dbt, Polars, Docker 🌺
-
Updated
Jun 7, 2024 - Jupyter Notebook
💜🌈📊 A Data Engineering Project that implements an ETL data pipeline using Dagster, Apache Spark, Streamlit, MinIO, Metabase, Dbt, Polars, Docker 🌺
A Python script extracts data from Zillow and stores it in an initial S3 bucket. Then, Lambda functions handle the flow: copying the data to a processing bucket and transforming it from JSON to CSV format. The final CSV data resides in another S3 bucket, ready to be loaded into Amazon Redshift for in-depth analysis. QuickSight for visualizations
Service for bulk-loading data to databases with automatic schema management (Redshift, Snowflake, BigQuery, ClickHouse, Postgres, MySQL)
No-code LLM Platform to launch APIs and ETL Pipelines to structure unstructured documents
ML pipeline to categorize emergency messages based on the needs communicated by the sender.
Engenharia de dados para implementação de tabela de supressão/quarentena de clientes utilizando Pyspark, Spark SQL, Pandas e APIs no Databricks.
This process illustrates how to structure and manipulate relational databases effectively, demonstrating key SQL operations and transformations within an Informatica environment. The provided images and detailed SQL commands serve as a comprehensive guide for implementing and understanding these database management tasks.
Aids for the public as a web app.
A CLI tool for transforming large RDF datasets using pure SPARQL.
Make stream processing easier! Easy-to-use streaming application development framework and operation platform.
One ETL tool to rule them all
A framework for writing Unstract Tools/Apps
Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage and metadata. Runs and scales everywhere python does.
Move your data with ease.
With the help of this repository you can evaluate my professional level. Everything I know about Data Engineering is stored here.
Crypto scraping project based on ETL process where data is getting scrap from an online website and performed transformation and load it into the SQL Server Database.
Jayvee is a domain-specific language and runtime for automated processing of data pipelines
Copy data from Azure Blob Storage to Amazon S3 using code. View Azure costs using Amazon QuickSight
Add a description, image, and links to the etl-pipeline topic page so that developers can more easily learn about it.
To associate your repository with the etl-pipeline topic, visit your repo's landing page and select "manage topics."