Pekko tutorial

Akka vs Pekko
As of umbrella release 22.10 Lightbend has changed the licensing model. Apache Pekko is the open source alternative. A BIG Thank you to the committed Pekko committers.

For now the branch migrate_pekko contains a 1st basic migration (with a few losses). Currently this is the only maintained branch. The plan is to move the content of this branch to a new pekko_tutorial repo.

"It's working!" a colleague used to shout across the office when yet another proof of concept was running it's first few hundred meters along the happy path, aware that the real work started right there. This repo contains a collection of runnable and self-contained examples from various Pekko Streams and Pekko Connectors tutorials, blogs and postings to provide you with exactly this feeling. See the class comment on how to run each example. These more complex examples are described below:

Element deduplication
Windturbine example
Apache Kafka WordCount
HL7 V2 over TCP via Kafka to Websockets
Analyse Wikipedia edits live stream
Movie subtitle translation via OpenAI API

Many of the examples deal with some kind of (shared) state. While most Pekko Streams operators are stateless, the samples in package sample.stream_shared_state also show some trickier stateful operators in action.

Other noteworthy examples:

The *Echo examples series implement round trips eg HttpFileEcho and WebsocketEcho
The branch grpc contains the basic gRPC examples and a chunked file upload . Use sbt compile or Rebuild Project in IDEA to re-generate the sources via the sbt-Pekko-grpc plugin.

Remarks:

Java 17 language level is kept, hence run with a late JDK 17 or higher. To speed things up graalvm-jdk-21 works best.
Most examples are throttled, so you can see from the console output what is happening
Some examples deliberately throw RuntimeException, so you can observe recovery behaviour
Using testcontainers allows running realistic scenarios ( eg SSEtoElasticsearch , KafkaServerTestcontainers , SlickIT)

Other resources:

Maintained examples are in pekko-stream-tests , the Streams Cookbook
Getting started guides: Streams Quickstart Guide and this popular stackoverflow article
The doc chapters Modularity, Composition and Hierarchy and Design Principles behind Apache Pekko Streams provide useful background
The concept of running streams using materialized values is also explained in this blog, this video and in this stackoverflow article

Element deduplication

Dropping identical (consecutive or non-consecutive) elements in an unbounded stream:

DeduplicateConsecutiveElements using the sliding operator
Dedupe shows the squbs Deduplicate GraphStage which allows to dedupe both types

The following use case uses a local caffeine cache to avoid duplicate HTTP file downloads:

Process a stream of incoming messages with reoccurring TRACE_ID
For the first message: download a .zip file from a FileServer and add TRACE_ID→Path to the local cache
For subsequent messages with the same TRACE_ID: fetch file from cache to avoid duplicate downloads per TRACE_ID
Use time based cache eviction to get rid of old downloads

Class	Description
FileServer	Local HTTP `FileServer` for non-idempotent file download simulation
LocalFileCacheCaffeine	Pekko streams client flow, with cache implemented with caffeine

Windturbine example

Working sample from the blog series 1-4 from Colin Breck where classic Actors are used to model shared state, life-cycle management and fault-tolerance in combination with Pekko Streams. Colin Breck explains these concepts and more in the 2017 Reactive Summit talk Islands in the Stream: Integrating Akka Streams and Akka Actors

Class	Description
SimulateWindTurbines	Starts n clients which feed measurements to the server
WindTurbineServer	Start server which a accumulates measurements

The clients communicate via websockets with the WindTurbineServer. After a restart of SimulateWindTurbines the clients are able to resume. Shutting down the WindTurbineServer results in reporting to the clients that the server is not reachable. After restarting WindTurbineServer the clients are able to resume. Since there is no persistence, the processing just continuous.

Apache Kafka WordCount

The ubiquitous word count with an additional message count. A message is a sequence of words. Start the classes in the order below and watch the console output.

Class	Description
KafkaServerEmbedded	Uses Embedded Kafka (= an in-memory Kafka instance). No persistence on restart
WordCountProducer	pekko-streams-kafka client which feeds random words to topic `wordcount-input`
WordCountKStreams.java	Kafka Streams DSL client to count words and messages and feed the results to `wordcount-output` and `messagecount-output` topics. Contains additional interactive queries which should yield the same results `WordCountConsumer`
WordCountConsumer	pekko-streams-kafka client which consumes aggregated results from topic `wordcount-output` and `messagecount-output`
DeleteTopicUtil	Utility to reset the offset

HL7 V2 over TCP via Kafka to Websockets

The PoC in package alpakka.tcp_to_websockets is from the E-Health domain, relaying HL7 V2 text messages in some kind of "Trophy" across these stages:

Hl7TcpClient → Hl7Tcp2Kafka → KafkaServer → Kafka2Websocket → WebsocketServer

The focus is on resilience (= try not to lose messages during the restart of the stages). However, currently messages may reach the WebsocketServer unordered (due to retry in Hl7TcpClient) and in-flight messages may get lost (upon re-start of WebsocketServer).

Start each stage separately in the IDE, or together via the integration test AlpakkaTrophySpec

Analyse Wikipedia edits live stream

Find out whose Wikipedia articles were changed in (near) real time by tapping into the Wikipedia Edits stream provided via SSE. The class SSEtoElasticsearch implements a workflow, using the title attribute as identifier from the SSE entity to fetch the extract from the Wikipedia API, eg for Douglas Adams. Text processing on this content using opennlp yields personsFound, which are added to the wikipediaedits Elasticsearch index. The index is queried periodically and the content may also be viewed with a Browser, eg

http://localhost:{mappedPort}/wikipediaedits/_search?q=personsFound:*

Movie subtitle translation via OpenAI API

SubtitleTranslator translates all blocks of an English source .srt file to a target language using the OpenAI API endpoints:

/chat/completions (gpt-3.5-turbo/gpt-4) used by default, see Doc
/completions (gpt-3.5-turbo-instruct) used as fallback, see Doc

Pekko streams helps in these areas:

Workflow modelling
Scene splitting to session windows. All blocks of a scene are grouped in one session and then translated in one API call
Throttling to not exceed the OpenAI API rate limits
Continuous writing of translated blocks to the target file to avoid data loss on NW failure

Name		Name	Last commit message	Last commit date
Latest commit History 1,057 Commits
.github/workflows		.github/workflows
docker		docker
project		project
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
.scala-steward.conf		.scala-steward.conf
LICENSE		LICENSE
README.md		README.md
build.sbt		build.sbt
measurements_subset_10000.txt		measurements_subset_10000.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github/workflows

.github/workflows

docker

docker

project

project

src

src

.gitattributes

.gitattributes

.gitignore

.gitignore

.scala-steward.conf

.scala-steward.conf

LICENSE

LICENSE

README.md

README.md

build.sbt

build.sbt

measurements_subset_10000.txt

measurements_subset_10000.txt

Repository files navigation

Pekko tutorial

Element deduplication

Windturbine example

Apache Kafka WordCount

HL7 V2 over TCP via Kafka to Websockets

Analyse Wikipedia edits live stream

Movie subtitle translation via OpenAI API

About

Releases

Packages

Contributors 3

Languages

License

pbernet/akka_streams_tutorial

Folders and files

Latest commit

History

Repository files navigation

Pekko tutorial

Element deduplication

Windturbine example

Apache Kafka WordCount

HL7 V2 over TCP via Kafka to Websockets

Analyse Wikipedia edits live stream

Movie subtitle translation via OpenAI API

About

Topics

Resources

License

Stars

Watchers

Forks

Languages