Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage and metadata. Runs and scales everywhere python does.
-
Updated
May 28, 2024 - Jupyter Notebook
Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage and metadata. Runs and scales everywhere python does.
Airflow DAGs for the Stellar ETL project
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
The open source high performance ELT framework powered by Apache Arrow
The Frank!Framework is an easy-to-use, stateless integration framework which allows (transactional) messages to be modified and exchanged between different systems.
Data transformation framework for ETL processing with SQL-like syntax and GIS extensions, based on Apache Spark
(Spatial) data harmonisation with hale studio (formerly HUMBOLDT Alignment Editor)
Logstash - transport and process your logs, events, or other data
Opiniated Framework to write ETL Pipelines controlled by a central config store.
AI Enhanced DataHive embarks on a mission to become a centralized hub for data of various kinds, offering templates for collectors to aggregate data centrally for further processing in other applications. This initiative arises from the repeated cycles of developing crawlers, extractors, and collectors across numerous projects.
Global Biotic Interactions provides access to existing species interaction datasets
A modern data marketplace that makes collaboration among diverse users (like business, analysts and engineers) easier, increasing efficiency and agility in data projects on AWS.
MTPy is a Python framework that provides a simple and intuitive interface for defining and running data pipelines. It is designed to be automatically deployed to the cloud using Docker.
Apache Spark based 'Dist' utility to supplement Data Cooker ETL tool
Documentation for the TriplyDB and TriplyETL products
Real-Time Event Streaming & Change Data Capture
Stetl, Streaming ETL, is a lightweight geospatial processing and ETL framework written in Python.
Professional portfolio of data science and analytics projects - 2024.
Add a description, image, and links to the etl-framework topic page so that developers can more easily learn about it.
To associate your repository with the etl-framework topic, visit your repo's landing page and select "manage topics."