All In One Custom Docker Image for Streaming/Real Time Data Preprocessing Developer on Apache Samza with Kafka

$ docker load < bigdata.tgz
$ docker load < etl.tgz
$ docker load < docker_samza.tar.gz
$ docker image ls

REPOSITORY TAG IMAGE ID CREATED SIZE
jentekllc/etl latest c3b44db4c5d6 30 hours ago 6.61GB
jentekllc/bigdata latest c7d3fb6d8221 31 hours ago 5.53GB
jentekllc/samza latest e9086c0ddaab 2 days ago 12.3GB
$ tar -xf additional_files.tar
$ nohup docker-compose -p j up --scale spark-worker=3 &
$ docker psCONTAINER ID   IMAGE                      COMMAND                  CREATED             STATUS             PORTS                                                                                                                                                     NAMES
034cde65b534 nginx:latest "/docker-entrypoint.…" About an hour ago Up About an hour 80/tcp, 0.0.0.0:5000->5000/tcp nginx-lb
26938b83703c jentekllc/bigdata:latest "/run_sshd_worker.sh" About an hour ago Up About an hour 22/tcp, 0.0.0.0:50864->38080/tcp j_spark-worker_3
88b9ada1ab4c jentekllc/bigdata:latest "/run_sshd_worker.sh" About an hour ago Up About an hour 22/tcp, 0.0.0.0:50863->38080/tcp j_spark-worker_1
2f18f6b5f9a9 jentekllc/bigdata:latest "/run_sshd_worker.sh" About an hour ago Up About an hour 22/tcp, 0.0.0.0:50862->38080/tcp j_spark-worker_2
e497e89b4e67 jentekllc/samza:latest "/home/hadoop/start_…" About an hour ago Up About an hour 0.0.0.0:50022->22/tcp, 0.0.0.0:58088->8088/tcp, 0.0.0.0:58888->8888/tcp, 0.0.0.0:58889->8889/tcp samza-server
23e310732fe2 jentekllc/etl:latest "/start_etl.sh" About an hour ago Up About an hour 0.0.0.0:40022->22/tcp, 0.0.0.0:48080->8080/tcp, 0.0.0.0:48888->8888/tcp, 0.0.0.0:48889->8889/tcp, 0.0.0.0:49000->9000/tcp, 0.0.0.0:40000->30000/tcp etl-server
6d3c81c0d6ff jentekllc/bigdata:latest "/run_sshd_master.sh" About an hour ago Up About an hour 0.0.0.0:4040->4040/tcp, 0.0.0.0:7077->7077/tcp, 0.0.0.0:8080->8080/tcp, 0.0.0.0:8088->8088/tcp, 0.0.0.0:8888-8889->8888-8889/tcp, 0.0.0.0:20022->22/tcp spark-master
73ce7dd06186 jentekllc/bigdata:latest "/run_sshd_hive.sh" About an hour ago Up About an hour 0.0.0.0:30022->22/tcp, 0.0.0.0:38088->8088/tcp, 0.0.0.0:39000->9000/tcp, 0.0.0.0:39083->9083/tcp
$ docker-compose -p j down
http://localhost:8080
http://localhost:5000
http://localhost:8888
http://localhost:48888
http://localhost:48080
http://localhost:49000
http://localhost:58088
http://localhost:58888
#From host
$ ssh -p 20022 hadoop@localhost
#Inside container
$ cd $SPARK_HOME
$ bin/spark-submit /spark/examples/src/main/python/pi.py
$ spark-sql
spark-sql
2021-11-06 23:26:42,842 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java class es where applicable
Setting default log level to "WARN".
Spark master: local[*], Application Id: local-1636241205417
spark-sql>
$ ssh -p 40022 hadoop@localhost
$ ssh -p 50022 hadoop@localhost
$ jps144 ResourceManager
1233 ClusterBasedJobCoordinatorRunner
1729 LocalContainerRunner
1848 LocalContainerRunner
1689 LocalContainerRunner
586 Kafka
1946 Jps
426 NodeManager
76 QuorumPeerMain
1324 ClusterBasedJobCoordinatorRunner
1421 ClusterBasedJobCoordinatorRunner
$ file * | grep directoryhello-samza:        directory
samza: directory

--

--

--

I am founder of Jen Tek LLC, a startup company in East Bay California developing AI powered, cloud based documentation/publishing software as a service.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Initial thoughts about Reliable Embedded Message Broker for PHP

On the Path to No More Broken Windows

[LeetCode][python3]Day25. Jump Game (30-Day LeetCoding Challenge)

Linked List… Circular or not?

How to fix error in port widths or dimentions on the reshape block input?

Build/Test your game in Unity

Learn Data Structures

Offshore vs Onshore Software Development

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
George Jen

George Jen

I am founder of Jen Tek LLC, a startup company in East Bay California developing AI powered, cloud based documentation/publishing software as a service.

More from Medium

Handling incremental duplicate data in AWS Redshift and AWS Glue by UPSERT

Apache AirFlow: Introduction and Installation

Using Spark on AWS

Log4j2 Impact Analysis on Datastores: Kafka, Elastic, Hadoop, Spark, Kibana