the portable Python dataframe library
-
Updated
Jun 7, 2024 - Python
the portable Python dataframe library
Open Targets python framework for post-GWAS analysis
💜🌈📊 A Data Engineering Project that implements an ETL data pipeline using Dagster, Apache Spark, Streamlit, MinIO, Metabase, Dbt, Polars, Docker 🌺
Python framework for building efficient data pipelines. It promotes modularity and collaboration, enabling the creation of complex pipelines from simple, reusable components.
ORM for Apache Spark and DataFrames schema manager
A Comprehensive Framework for Building End-to-End Recommendation Systems with State-of-the-Art Models
Hopsworks - Data-Intensive AI platform with a Feature Store
An open source, standard data file format for graph data storage and retrieval.
Generate relevant synthetic data quickly for your projects. The Databricks Labs synthetic data generator (aka `dbldatagen`) may be used to generate large simulated / synthetic data sets for test, POCs, and other uses in Databricks environments including in Delta Live Tables pipelines
Engenharia de dados para implementação de tabela de supressão/quarentena de clientes utilizando Pyspark, Spark SQL, Pandas e APIs no Databricks.
Simple and Distributed Machine Learning
State of the Art Natural Language Processing
Apache Linkis builds a computation middleware layer to facilitate connection, governance and orchestration between the upper applications and the underlying data engines.
Add a description, image, and links to the pyspark topic page so that developers can more easily learn about it.
To associate your repository with the pyspark topic, visit your repo's landing page and select "manage topics."