Skip to content

pflooky/data-caterer-example

Repository files navigation

data-caterer-example

Data Catering

Data Caterer is a metadata driven data generation tool that aids in creating production like data across batch and event data systems. Run data validations to ensure your systems have ingested it as expected. Use the Java, Scala API, or YAML files to help with setup or customisation that are all run via Docker.

This repo contains example Java and Scala API usage for Data Caterer.

How

Can follow detailed documentation found here for more details.

Java

  1. Create new Java class similar to DocumentationJavaPlanRun.java
    1. Needs to extend io.github.datacatering.datacaterer.javaapi.api.PlanRun

Scala

  1. Create new Scala class similar to DocumentationPlanRun.scala
    1. Needs to extend io.github.datacatering.datacaterer.api.PlanRun

Run

Requires:

  • Docker
./run.sh
#check results under docker/sample/report/index.html folder

Docker

Create your own Docker image via:

./gradlew clean build
docker build -t <my_image_name>:<my_image_tag> .
docker run -e PLAN_CLASS=io.github.datacatering.plan.DocumentationPlanRun -v ${PWD}/docs/run:/opt/app/data <my_image_name>:<my_image_tag>
#check results under docs/run folder

Docker Compose

Run with own class from either Java or Scala API:

./gradlew clean build
cd docker
PLAN_CLASS=io.github.datacatering.plan.DocumentationPlanRun DATA_SOURCE=postgres docker-compose up -d datacaterer

Details from docs.
Docker compose sample found under docker folder.

cd docker
docker-compose up -d datacaterer

Check result under here.

Change to another data source via:

  • postgres
  • mysql
  • cassandra
  • solace
  • kafka
  • http
DATA_SOURCE=cassandra docker-compose up -d datacaterer

Helm

helm install data-caterer ./data-caterer-example/helm/data-caterer

Benchmarks

Base benchmark tests can be run via:

bash benchmark/run_benchmark.sh

Results can be found under benchmark/results.