Openmrs Real-time Streaming Topology

[DEPRECATED] Note that this repo has been moved to https://github.com/kimaina/openmrs-elt

The motivation of this project is to provide ability of processing data in real-time from various sources like openmrs, eid, e.t.c

Requirements

Make sure you have the latest docker and docker compose

Install Docker.
Install Docker-compose.
Clone this repository

Getting started

You will only have to run only 3 commands to get the entire cluster running. Open up your terminal and run these commands:

# this will install  5 containers (mysql, kafka, connect (dbz), openmrs, zookeeper, portainer and cAdvisor)
# cd /media/sf_akimaina/openmrs-etl
export DEBEZIUM_VERSION=0.8
docker-compose -f docker-compose.yaml up

# Start MySQL connector (VERY IMPORTANT)
curl -i -X POST -H "Accept:application/json" -H  "Content-Type:application/json" http://localhost:8083/connectors/ -d @register-mysql.json


# Realtime streaming and processing
Please use either spark(scala)/pyspark/ksql. For this project I'll demo using ksql

In order to avoid crashing of containers i.e code 137, please increase memory size and cpu of your docker VM to > 8gb and >4 cores as shown in the figure below

If everything runs as expected, expect to see all these containers running:

You can access this here: http://localhost:9000

Openmrs

Openmrs Application will be eventually accessible on http://localhost:8080/openmrs. Credentials on shipped demo data:

Username: admin
Password: Admin123

Example Batch using Jupyter Notebook (Spark Standalone Mode)

conda install pyspark=2.4.5

jupyter notebook encounter_job.ipynb

Spark Master and Worker Nodes

Master Node: http://localhost:4040/
Worker Node 1: http://localhost:8100/
Worker Node 2: http://localhost:8200/
Worker Node 3: http://localhost:8300/
Worker Node 4: http://localhost:8400/

Besed on: https://github.com/big-data-europe/docker-spark/blob/master/README.md

for spark on kubernetes deployment: https://github.com/big-data-europe/docker-spark/blob/master/README.md

Docker Container Manager: Portainer

http://localhost:9000

MySQL client

docker-compose -f docker-compose.yaml exec mysql bash -c 'mysql -u $MYSQL_USER -p$MYSQL_PASSWORD inventory'

Schema Changes Topic

docker-compose -f docker-compose.yaml exec kafka /kafka/bin/kafka-console-consumer.sh     --bootstrap-server kafka:9092     --from-beginning     --property print.key=true     --topic schema-changes.openmrs

How to Verify MySQL connector (Debezium)

curl -H "Accept:application/json" localhost:8083/connectors/

Shut down the cluster

docker-compose -f docker-compose.yaml down

Debezium Topics

Consume messages from a Debezium topic [obs,encounter,person, e.t.c]

All you have to do is change the topic to --topic dbserver1.openmrs.

   docker-compose -f docker-compose.yaml exec kafka /kafka/bin/kafka-console-consumer.sh \
    --bootstrap-server kafka:9092 \
    --from-beginning \
    --property print.key=true \
    --topic dbserver1.openmrs.obs

Consume messages using KSQL

Start KSQL CLI

  docker run --network openmrs-etl_default --rm --interactive --tty \
      confluentinc/cp-ksql-cli:5.2.2 \
      http://ksql-server:8088

After running the above command, a KSQL CLI will be presented interactively

Run KSQL Streaming SQL

You can call any KSQL streaming sql command as highlighted here https://docs.confluent.io/current/ksql/docs/tutorials/index.html Here are a few examples:

  SHOW TOPICS;

For more KSQL streaming command please visit https://docs.confluent.io/current/ksql

Cluster Design Architecture

This section attempts to explain how the clusters work by breaking everything down
Everything here has been dockerized so you don't need to do these steps
- Kafka Cluster: Kafka
- Spark Cluster: Spark
- Debezium Layer: Debezium

Directory Structure

project
│   README.md 
│   kafka.md  
│   debezium.md
│   spark.md
│   docker-compose.yaml
│   
│
template
│   │   java
│   │   python
│   │   scala
│   └───subfolder1
│       │   file111.txt
│       │   file112.txt
│       │   ...

Writing batch/streaming jobs

Java Template: Java
Python Template: Python
Scala Template: Scala

Besed on: https://github.com/big-data-europe/docker-spark/blob/master/README.md

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
.ipynb_checkpoints		.ipynb_checkpoints
data		data
dbdump		dbdump
pics		pics
template		template
.DS_Store		.DS_Store
.env		.env
.gitignore		.gitignore
README.md		README.md
architecture.png		architecture.png
debezium.md		debezium.md
docker-compose.yaml		docker-compose.yaml
encounter_job.ipynb		encounter_job.ipynb
kafka.md		kafka.md
mysql.cnf		mysql.cnf
register-mysql.json		register-mysql.json
requirements.txt		requirements.txt
spark-streaming-openmrs.ipynb		spark-streaming-openmrs.ipynb
spark-streaming.ipynb		spark-streaming.ipynb
spark.md		spark.md
startup.sh		startup.sh

kimaina/openmrs-etl

Folders and files

Latest commit

History

Repository files navigation