Skip to content

Benchmark the IO performance of Apache Spark in the context of Astro data

License

Notifications You must be signed in to change notification settings

astrolabsoftware/sparkioref

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

sparkioref

Benchmark the IO performance of Apache Spark (Scala/Python). Currently supported: csv/json, parquet, FITS.

Run the benchmark

Edit the run_benchmark.sh file with your data and cluster configuration, and launch it using

./run_benchmark.sh

Example

Configuration:

  • Spark 2.3.1
  • HDFS 2.8.4
  • Input dataset: 1,100,000,000 objects (x, y, z)
  • 153 cores (9 executors), 300 GB RAM total
  • No cache: 100 iterations (data distributed and read)

About

Benchmark the IO performance of Apache Spark in the context of Astro data

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published