pyspark

Star

Here are 3,396 public repositories matching this topic...

ibis-project / ibis

Star

the portable Python dataframe library

Updated Jun 7, 2024
Python

opentargets / gentropy

Star

Open Targets python framework for post-GWAS analysis

python open-source gwas genetics pyspark drug-discovery

Updated Jun 7, 2024
Jupyter Notebook

longNguyen010203 / Youtube-ETL-Pipeline

Star

💜🌈📊 A Data Engineering Project that implements an ETL data pipeline using Dagster, Apache Spark, Streamlit, MinIO, Metabase, Dbt, Polars, Docker 🌺

Updated Jun 7, 2024
Jupyter Notebook

Nike-Inc / koheesio

Star

Python framework for building efficient data pipelines. It promotes modularity and collaboration, enabling the creation of complex pipelines from simple, reusable components.

python pyspark data-engineering pydantic delta-lake

Updated Jun 7, 2024
Python

asuiu / SparkORM

Star

ORM for Apache Spark and DataFrames schema manager

python sqlalchemy orm spark python3 pyspark spark-orm spark-sql pyspark-python sqlalchemy-orm sparkql

Updated Jun 7, 2024
Python

J-sephB-lt-n / useful-code-snippets

Star

A searchable collection of useful little pieces of code

python shell bash cloud spark ec2 graph virtual-machine gcp pyspark dataproc streamlit rustworkx

Updated Jun 7, 2024
Python

sb-ai-lab / RePlay

Star

A Comprehensive Framework for Building End-to-End Recommendation Systems with State-of-the-Art Models

machine-learning deep-learning algorithms evaluation distributed-computing pytorch collaborative-filtering matrix-factorization pyspark recommender-system recommendation-algorithms

Updated Jun 7, 2024
Python

logicalclocks / hopsworks

Star

Hopsworks - Data-Intensive AI platform with a Feature Store

python aws data-science machine-learning serverless azure gcp ml pyspark feature-engineering governance model-serving mlops feature-store feature-management hopsworks kserve

Updated Jun 7, 2024
Java

apache / incubator-graphar

Star

An open source, standard data file format for graph data storage and retrieval.

big-data spark etl graph pyspark graph-analysis data-orchestration graph-storage

Updated Jun 7, 2024
C++

DPetrukhina / pyspark_projects

Star

pyspark

Updated Jun 7, 2024
Jupyter Notebook

mitchelllisle / sparkdantic

Star

✨ A Pydantic to PySpark schema library

schema pyspark pydantic

Updated Jun 7, 2024
Python

Generate relevant synthetic data quickly for your projects. The Databricks Labs synthetic data generator (aka `dbldatagen`) may be used to generate large simulated / synthetic data sets for test, POCs, and other uses in Databricks environments including in Delta Live Tables pipelines

python spark faker pyspark spark-streaming data-generation databricks synthetic-data datagen datagenerator deltalake datageneration delta-live-tables

Updated Jun 6, 2024
Python

aronmarcus / Pyspark_QuarentenaGlobal_table_Databricks

Star

Engenharia de dados para implementação de tabela de supressão/quarentena de clientes utilizando Pyspark, Spark SQL, Pandas e APIs no Databricks.

python api data-science big-data sftp pandas pyspark data-engineering sharepoint spark-sql pipeline-pattern modular-design etl-pipeline salesforce-marketing-cloud

Updated Jun 6, 2024
Jupyter Notebook

KevinShindel / MachineLearning

Star

Pandas, Sci-kit, SparkML

scikit-learn pandas pyspark

Updated Jun 6, 2024
Jupyter Notebook

microsoft / SynapseML

Star

Simple and Distributed Machine Learning

Updated Jun 6, 2024
Scala

allisonwang-db / pyspark-data-sources

Star

Custom PySpark Data Sources

pyspark

Updated Jun 7, 2024
Python

JohnSnowLabs / spark-nlp

Star

State of the Art Natural Language Processing

Updated Jun 7, 2024
Scala

baranylcn / churn_w_pyspark

Star

python big-data pyspark churn

Updated Jun 5, 2024
Python

apache / linkis

Star

Apache Linkis builds a computation middleware layer to facilitate connection, governance and orchestration between the upper applications and the underlying data engines.

Updated Jun 7, 2024
Java

capitalone / datacompy

Star

Pandas, Polars, and Spark DataFrame comparison for humans and more!

python data-science data spark numpy pandas pyspark compare dask dataframes fugue polars

Updated Jun 6, 2024
Python

Improve this page

Add a description, image, and links to the pyspark topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the pyspark topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pyspark

Here are 3,396 public repositories matching this topic...

ibis-project / ibis

opentargets / gentropy

longNguyen010203 / Youtube-ETL-Pipeline

Nike-Inc / koheesio

asuiu / SparkORM

J-sephB-lt-n / useful-code-snippets

sb-ai-lab / RePlay

logicalclocks / hopsworks

apache / incubator-graphar

DPetrukhina / pyspark_projects

mitchelllisle / sparkdantic

databrickslabs / dbldatagen

aronmarcus / Pyspark_QuarentenaGlobal_table_Databricks

KevinShindel / MachineLearning

microsoft / SynapseML

allisonwang-db / pyspark-data-sources

JohnSnowLabs / spark-nlp

baranylcn / churn_w_pyspark

apache / linkis

capitalone / datacompy

Improve this page

Add this topic to your repo