#

apache-spark

Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.

Here are 1,668 public repositories matching this topic...

kubeflow / spark-operator

Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.

kubernetes spark apache-spark kubernetes-operator kubernetes-controller kubernetes-crd google-cloud-dataproc

Updated Jun 10, 2024
Go

mlflow / mlflow

Open source platform for the machine learning lifecycle

machine-learning ai apache-spark ml model-management mlflow

Updated Jun 10, 2024
Python

rafie-b / Data-Analytics

Activities of Data Analysis.

python api aws data-science data sql database apache-spark scikit-learn jupyter-notebook data-analytics dataframe business-analytics

Updated Jun 9, 2024
Jupyter Notebook

JersonGB22 / DataScience_IBM_StockPredictionLSTM_Project

In the IBM Advanced Data Science specialization, an interactive real-time web application was developed using LSTM networks in TensorFlow to predict stock market trends for global companies.

python data-science machine-learning apache-spark deep-learning tensorflow machine scikit-learn plotly lstm-neural-networks streamlit

Updated Jun 9, 2024
Jupyter Notebook

rafie-b / Big-Data-Analytics

python apache-spark data-wrangling

Updated Jun 9, 2024
Jupyter Notebook

lakeFS

treeverse / lakeFS

lakeFS - Data version control for your data lake | Git for data

go golang apache-spark aws-s3 google-cloud-storage data-engineering data-lake azure-storage data-version-control object-storage datalake hadoop-filesystem data-quality data-versioning azure-blob-storage apache-sparksql git-for-data lakefs datalakes

Updated Jun 9, 2024
Go

newfront / hitchhikers_guide_to_deltalake_streaming

Don't Panic. This guide will help you when it feels like the end of the world.

apache-spark apache deltalake hitchhikers-guide

Updated Jun 8, 2024
Jupyter Notebook

Pipe199x / Tokyo-Azure-Spark

Data analysis project using Azure, Apache Spark, and Python to process Tokyo Olympic data.

python data-science apache-spark azure

Updated Jun 8, 2024
Jupyter Notebook

SANSA-Stack / SANSA-Stack

Big Data RDF Processing and Analytics Stack built on Apache Spark and Apache Jena http://sansa-stack.github.io/SANSA-Stack/

spark apache-spark rdf distributed-computing semantic-web flink apache-jena

Updated Jun 9, 2024
Scala

MelinaMoraiti / Spark-Text-Analyzer

An Apache Spark application to analyze word frequencies and compute TF-IDF weights across multiple text file sets using Spark's MLlib library.

apache-spark mllib tf-idf apachespark-rdd max-word-frequency

Updated Jun 8, 2024
Scala

G-Research / fasttrackml

Experiment tracking server focused on speed and scalability

visualization metadata data-science machine-learning ai apache-spark metrics tensorflow ml data-visualization pytorch tensorboard mlops mlflow mlflow-tracking-server experiment-tracking metadata-tracking

Updated Jun 10, 2024
Go

dongma / spark-graphx

spark graphx which is designed for distributed graph calculate, including spark-sql spark-streaming and RDD operations

apache-spark spark-sql spark-graphx

Updated Jun 7, 2024
Scala

aloneguid / parquet-dotnet

Fully managed Apache Parquet implementation

windows linux ios xamarin apache-spark dotnet xbox dotnet-core dotnet-standard apache-parquet

Updated Jun 9, 2024
C#

DineshThapaX / big-data-management-project

This project includes both Diabetes Prediction using Machine Learning Algorithms and Graph Analysis using Neo4j. Have a look at the Report for complete understanding.

machine-learning apache-spark neo4j bigdata gaussian-mixture-models logistic-regression clustering-algorithm dataanalytics randomforestclassifier gradientboostedtrees graphanalytics classification-techniques

Updated Jun 7, 2024
Jupyter Notebook

O2-Czech-Republic / proxima-platform

The Proxima platform.

apache-spark stream-processing iot-platform apache-beam apache-flink batch-processing analytical-platform unified-data-processing data-mesh

Updated Jun 7, 2024
Java

Sayed-Nahid / Cyber-Threat-Analytics-for-Instant-Response-Using-Big-Data

iot machine-learning big-data apache-spark cyber-security cyber-threat-intelligence cyber-analytics

Updated Jun 7, 2024

GoogleCloudPlatform / dataproc-templates

Dataproc templates and pipelines for solving simple in-cloud data tasks

bigquery apache-spark jupyter-notebook gcp google-cloud pyspark google-cloud-platform

Updated Jun 7, 2024
Python

SynapseML

microsoft / SynapseML

Simple and Distributed Machine Learning

Updated Jun 6, 2024
Scala

PastorGL / datacooker-etl

Data transformation framework for ETL processing with SQL-like syntax and GIS extensions, based on Apache Spark

sql apache-spark etl geospatial gis data-transformation geospatial-data schema-less etl-framework columnar-format

Updated Jun 6, 2024
Java

geoHeil / awesome-tools

curated list of awesome tools and libraries for specific domains

python data-science streaming big-data apache-spark

Updated Jun 6, 2024

Created by Matei Zaharia

Released May 26, 2014

Followers: 417 followers
Repository: apache/spark
Website: spark.apache.org
Wikipedia: Wikipedia

Related Topics