Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
-
Updated
Jun 10, 2024 - Go
Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
Open source platform for the machine learning lifecycle
Activities of Data Analysis.
In the IBM Advanced Data Science specialization, an interactive real-time web application was developed using LSTM networks in TensorFlow to predict stock market trends for global companies.
lakeFS - Data version control for your data lake | Git for data
Don't Panic. This guide will help you when it feels like the end of the world.
Data analysis project using Azure, Apache Spark, and Python to process Tokyo Olympic data.
Big Data RDF Processing and Analytics Stack built on Apache Spark and Apache Jena http://sansa-stack.github.io/SANSA-Stack/
An Apache Spark application to analyze word frequencies and compute TF-IDF weights across multiple text file sets using Spark's MLlib library.
Experiment tracking server focused on speed and scalability
spark graphx which is designed for distributed graph calculate, including spark-sql spark-streaming and RDD operations
Fully managed Apache Parquet implementation
This project includes both Diabetes Prediction using Machine Learning Algorithms and Graph Analysis using Neo4j. Have a look at the Report for complete understanding.
The Proxima platform.
Dataproc templates and pipelines for solving simple in-cloud data tasks
Simple and Distributed Machine Learning
Data transformation framework for ETL processing with SQL-like syntax and GIS extensions, based on Apache Spark
curated list of awesome tools and libraries for specific domains
Created by Matei Zaharia
Released May 26, 2014