All In One Custom Docker Image for Streaming/Real Time Data Preprocessing Developer on Apache Samza with Kafka

$ docker load < bigdata.tgz
$ docker load < etl.tgz
$ docker load < docker_samza.tar.gz
$ docker image ls

REPOSITORY TAG IMAGE ID CREATED SIZE
jentekllc/etl latest c3b44db4c5d6 30 hours ago 6.61GB
jentekllc/bigdata latest c7d3fb6d8221 31 hours ago 5.53GB
jentekllc/samza latest e9086c0ddaab 2 days ago 12.3GB
$ tar -xf additional_files.tar
$ nohup docker-compose -p j up --scale spark-worker=3 &
$ docker psCONTAINER ID   IMAGE                      COMMAND                  CREATED             STATUS             PORTS                                                                                                                                                     NAMES
034cde65b534 nginx:latest "/docker-entrypoint.…" About an hour ago Up About an hour 80/tcp, 0.0.0.0:5000->5000/tcp nginx-lb
26938b83703c jentekllc/bigdata:latest "/run_sshd_worker.sh" About an hour ago Up About an hour 22/tcp, 0.0.0.0:50864->38080/tcp j_spark-worker_3
88b9ada1ab4c jentekllc/bigdata:latest "/run_sshd_worker.sh" About an hour ago Up About an hour 22/tcp, 0.0.0.0:50863->38080/tcp j_spark-worker_1
2f18f6b5f9a9 jentekllc/bigdata:latest "/run_sshd_worker.sh" About an hour ago Up About an hour 22/tcp, 0.0.0.0:50862->38080/tcp j_spark-worker_2
e497e89b4e67 jentekllc/samza:latest "/home/hadoop/start_…" About an hour ago Up About an hour 0.0.0.0:50022->22/tcp, 0.0.0.0:58088->8088/tcp, 0.0.0.0:58888->8888/tcp, 0.0.0.0:58889->8889/tcp samza-server
23e310732fe2 jentekllc/etl:latest "/start_etl.sh" About an hour ago Up About an hour 0.0.0.0:40022->22/tcp, 0.0.0.0:48080->8080/tcp, 0.0.0.0:48888->8888/tcp, 0.0.0.0:48889->8889/tcp, 0.0.0.0:49000->9000/tcp, 0.0.0.0:40000->30000/tcp etl-server
6d3c81c0d6ff jentekllc/bigdata:latest "/run_sshd_master.sh" About an hour ago Up About an hour 0.0.0.0:4040->4040/tcp, 0.0.0.0:7077->7077/tcp, 0.0.0.0:8080->8080/tcp, 0.0.0.0:8088->8088/tcp, 0.0.0.0:8888-8889->8888-8889/tcp, 0.0.0.0:20022->22/tcp spark-master
73ce7dd06186 jentekllc/bigdata:latest "/run_sshd_hive.sh" About an hour ago Up About an hour 0.0.0.0:30022->22/tcp, 0.0.0.0:38088->8088/tcp, 0.0.0.0:39000->9000/tcp, 0.0.0.0:39083->9083/tcp
$ docker-compose -p j down
http://localhost:8080
http://localhost:5000
http://localhost:8888
http://localhost:48888
http://localhost:48080
http://localhost:49000
http://localhost:58088
http://localhost:58888
#From host
$ ssh -p 20022 hadoop@localhost
#Inside container
$ cd $SPARK_HOME
$ bin/spark-submit /spark/examples/src/main/python/pi.py
$ spark-sql
spark-sql
2021-11-06 23:26:42,842 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java class es where applicable
Setting default log level to "WARN".
Spark master: local[*], Application Id: local-1636241205417
spark-sql>
$ ssh -p 40022 hadoop@localhost
$ ssh -p 50022 hadoop@localhost
$ jps144 ResourceManager
1233 ClusterBasedJobCoordinatorRunner
1729 LocalContainerRunner
1848 LocalContainerRunner
1689 LocalContainerRunner
586 Kafka
1946 Jps
426 NodeManager
76 QuorumPeerMain
1324 ClusterBasedJobCoordinatorRunner
1421 ClusterBasedJobCoordinatorRunner
$ file * | grep directoryhello-samza:        directory
samza: directory

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
George Jen

George Jen

56 Followers

I am founder of Jen Tek LLC, a startup company in East Bay California developing AI powered, cloud based documentation/publishing software as a service.