All in one custom and comprehensive Docker Image for the data engineering developer on Apache Spark

George Jen, Jen Tek LLC

Introduction

We want to build a custom docker image that includes everything a data engineering developer would need:

CentOS 8 image
Python3
Java Development Toolkit 1.8
Jupyter-notebook server to run Python from the host
ssh-server for the ease of connecting to the container using ssh and scp, as oppose to using docker exec and docker cp
Apache Spark
Graphframes for Apache Spark for Graph