All in one custom and comprehensive Docker Image for the data engineering developer on Apache Spark

$ docker load < bigdata.tgz
$ docker image lsREPOSITORY          TAG       IMAGE ID       CREATED       SIZEjentekllc/bigdata   latest    b2b671d197f7   4 hours ago   5.51GB
$ tar -xf  bigdata_docker.tar
$ nohup docker-compose -p j up --scale spark-worker=3 &
$ docker psCONTAINER ID   IMAGE                      COMMAND                  CREATED              STATUS              PORTS                                                                                                     NAMESf23c2863e235   nginx:latest               "/docker-entrypoint.…"   About a minute ago   Up 56 seconds       80/tcp, 0.0.0.0:5000->5000/tcp                                                                            nginx-lb1cb418088d2c   jentekllc/bigdata:latest   "/run_sshd_worker.sh"    About a minute ago   Up 57 seconds       22/tcp, 0.0.0.0:49851->38080/tcp                                                                          j-spark-worker-3997537fb1887   jentekllc/bigdata:latest   "/run_sshd_worker.sh"    About a minute ago   Up 57 seconds       22/tcp, 0.0.0.0:49852->38080/tcp                                                                          j-spark-worker-161bd4afc30a0   jentekllc/bigdata:latest   "/run_sshd_worker.sh"    About a minute ago   Up 58 seconds       22/tcp, 0.0.0.0:49850->38080/tcp                                                                          j-spark-worker-216a493eb513d   jentekllc/bigdata:latest   "/run_sshd_master.sh"    About a minute ago   Up About a minute   0.0.0.0:7077->7077/tcp, 0.0.0.0:8080->8080/tcp, 0.0.0.0:8888-8889->8888-8889/tcp, 0.0.0.0:20022->22/tcp   spark-master2707ab560407   jentekllc/bigdata:latest   "/run_sshd_hive.sh"      About a minute ago   Up About a minute   0.0.0.0:9000->9000/tcp, 0.0.0.0:9083->9083/tcp, 0.0.0.0:30022->22/tcp                                     hadoop-hive
$ docker-compose -p j down
http://localhost:8080
http://localhost:5000
http://localhost:8888
ssh -p 20022 hadoop@localhost$ cd $SPARK_HOME
$ bin/spark-submit /spark/examples/src/main/python/pi.py

--

--

--

I am founder of Jen Tek LLC, a startup company in East Bay California developing AI powered, cloud based documentation/publishing software as a service.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Kotlin Multiplatform Mobile

Lessons learned while working with Deemaze

Continuous Integration for Android with Travis CI

#Networking is good for many thingsincluding finding your next job. https://t.co/7T2VaWlyuh

2020 Was a Record-Breaking Year for ADA Digital Accessibility Lawsuits

Part 1: How to build a Job board with Vue js and the Laravel 5.5 API ? (Updated)

Cut Your Trading Losses in 5 Steps on QuantConnect

Importance of Logging + Best Practices

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
George Jen

George Jen

I am founder of Jen Tek LLC, a startup company in East Bay California developing AI powered, cloud based documentation/publishing software as a service.

More from Medium

Applied Apache Airflow- Pros/Cons

Apache Airflow components with Open Source Technologies

Loading Data from PostgreSQL to AWS Redshift

Apache Airflow: DAG Structure and Data Pipeline

How to setup Simple Hadoop Cluster on Docker

depiction of Hadoop cluster with datanodes and the namenode