All In One Custom Docker Image for Streaming/Real Time Data Preprocessing Developer on Apache Samza with Kafka

$ docker load < bigdata.tgz
$ docker load < etl.tgz
$ docker load < docker_samza.tar.gz
$ docker image ls

REPOSITORY TAG IMAGE ID CREATED SIZE
jentekllc/etl latest c3b44db4c5d6 30 hours ago 6.61GB
jentekllc/bigdata latest c7d3fb6d8221 31 hours ago 5.53GB
jentekllc/samza latest e9086c0ddaab 2 days ago 12.3GB
$ tar -xf additional_files.tar
$ nohup docker-compose -p j up --scale spark-worker=3 &
$ docker psCONTAINER ID   IMAGE                      COMMAND                  CREATED             STATUS             PORTS                                                                                                                                                     NAMES
034cde65b534 nginx:latest "/docker-entrypoint.…" About an hour ago Up About an hour 80/tcp, 0.0.0.0:5000->5000/tcp nginx-lb
26938b83703c jentekllc/bigdata:latest "/run_sshd_worker.sh" About an hour ago Up About an hour 22/tcp, 0.0.0.0:50864->38080/tcp j_spark-worker_3
88b9ada1ab4c jentekllc/bigdata:latest "/run_sshd_worker.sh" About an hour ago Up About an hour 22/tcp, 0.0.0.0:50863->38080/tcp j_spark-worker_1
2f18f6b5f9a9 jentekllc/bigdata:latest "/run_sshd_worker.sh" About an hour ago Up About an hour 22/tcp, 0.0.0.0:50862->38080/tcp j_spark-worker_2
e497e89b4e67 jentekllc/samza:latest "/home/hadoop/start_…" About an hour ago Up About an hour 0.0.0.0:50022->22/tcp, 0.0.0.0:58088->8088/tcp, 0.0.0.0:58888->8888/tcp, 0.0.0.0:58889->8889/tcp samza-server
23e310732fe2 jentekllc/etl:latest "/start_etl.sh" About an hour ago Up About an hour 0.0.0.0:40022->22/tcp, 0.0.0.0:48080->8080/tcp, 0.0.0.0:48888->8888/tcp, 0.0.0.0:48889->8889/tcp, 0.0.0.0:49000->9000/tcp, 0.0.0.0:40000->30000/tcp etl-server
6d3c81c0d6ff jentekllc/bigdata:latest "/run_sshd_master.sh" About an hour ago Up About an hour 0.0.0.0:4040->4040/tcp, 0.0.0.0:7077->7077/tcp, 0.0.0.0:8080->8080/tcp, 0.0.0.0:8088->8088/tcp, 0.0.0.0:8888-8889->8888-8889/tcp, 0.0.0.0:20022->22/tcp spark-master
73ce7dd06186 jentekllc/bigdata:latest "/run_sshd_hive.sh" About an hour ago Up About an hour 0.0.0.0:30022->22/tcp, 0.0.0.0:38088->8088/tcp, 0.0.0.0:39000->9000/tcp, 0.0.0.0:39083->9083/tcp
$ docker-compose -p j down
http://localhost:8080
http://localhost:5000
http://localhost:8888
http://localhost:48888
http://localhost:48080
http://localhost:49000
http://localhost:58088
http://localhost:58888
#From host
$ ssh -p 20022 hadoop@localhost
#Inside container
$ cd $SPARK_HOME
$ bin/spark-submit /spark/examples/src/main/python/pi.py
$ spark-sql
spark-sql
2021-11-06 23:26:42,842 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java class es where applicable
Setting default log level to "WARN".
Spark master: local[*], Application Id: local-1636241205417
spark-sql>
$ ssh -p 40022 hadoop@localhost
$ ssh -p 50022 hadoop@localhost
$ jps144 ResourceManager
1233 ClusterBasedJobCoordinatorRunner
1729 LocalContainerRunner
1848 LocalContainerRunner
1689 LocalContainerRunner
586 Kafka
1946 Jps
426 NodeManager
76 QuorumPeerMain
1324 ClusterBasedJobCoordinatorRunner
1421 ClusterBasedJobCoordinatorRunner
$ file * | grep directoryhello-samza:        directory
samza: directory

--

--

--

I am founder of Jen Tek LLC, a startup company in East Bay California developing AI powered, cloud based documentation/publishing software as a service.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Azure Static Web Apps are Generally Available

Screenshot of the Azure Static Web Apps comparison table from the Azure Portal

Reset on master branch

Recursion in JavaScript using factorial and Fibonacci in an easy and understandable way

How to start coding in Python

AWS Lambda & API Gateway — Network Computer Player

PHP 8: Check Out The New Features in The Latest Version

Android MultiTriggerBomb — Prevent Code Execution Till All Triggers Are Down (or Timer Is Expired)

UI 01: Autocomplete Implementation Using Javascript, HTML, and CSS

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
George Jen

George Jen

I am founder of Jen Tek LLC, a startup company in East Bay California developing AI powered, cloud based documentation/publishing software as a service.

More from Medium

Outgrowing Postgres? Keep using Postgres!

Sailing through Kafka Streams

Building Applications with Apache Spark and Apache Pulsar

Migrating Bureau of Labor Statistics Data to Elasticsearch Using Perl and Redis