All In One Custom Docker Image for Streaming/Real Time Data Preprocessing Developer on Apache Samza with Kafka

George Jen, Jen Tek LLC

We have built a custom docker image that includes everything a data engineering developer would need:

CentOS 8 image
Python3
Java Development Toolkit 1.8
Jupyter-notebook server to run Python and Scala Spylon Kernel from the host
ssh-server for the ease of connecting to the container using ssh and scp, as oppose using docker exec and docker cp
Apache Spark
Graphframes for Apache Spark for Graph