Ceph is a distributed object, block, and file storage platform
-
Updated
Jun 1, 2024 - C++
Ceph is a distributed object, block, and file storage platform
SeaweedFS is a fast distributed storage system for blobs, objects, files, and data lake, for billions of files! Blob store has O(1) disk seek, cloud tiering. Filer supports Cloud Drive, cross-DC active-active replication, Kubernetes, POSIX FUSE mount, S3 API, S3 Gateway, Hadoop, WebDAV, encryption, Erasure Coding.
JuiceFS is a distributed POSIX file system built on top of Redis and S3.
Kafka Connect HDFS connector
Utils for streaming large files (S3, HDFS, gzip, bz2...)
80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Functions, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Addax is a versatile open-source ETL tool that can seamlessly transfer data between various RDBMS and NoSQL databases, making it an ideal solution for data migration.
Fundamentals of Spark with Python (using PySpark), code examples
The Universal Storage Engine
DC/OS SDK is a collection of tools, libraries, and documentation for easy integration of technologies such as Kafka, Cassandra, HDFS, Spark, and TensorFlow with DC/OS.
Web tool for Kafka Connect |
Add a description, image, and links to the hdfs topic page so that developers can more easily learn about it.
To associate your repository with the hdfs topic, visit your repo's landing page and select "manage topics."