George Jen, Jen Tek LLC


Apache Spark natively includes library for Graph computing called Graphx, a distributed graph processing framework. It is based on the Spark platform and provides a simple, easy-to-use and rich interface for graph computing and graph mining, which greatly facilitates the demand for distributed graph processing.

However, currently Graphx only supports Scala, you need to write Scala code to import and invoke Graphx API calls, because Scala is native to Spark as Spark itself was written in Scala. While Spark (except from Graphx module) supports Python, Scala and R in addition to Scala.

If you want…

George Jen, Jen Tek LLC


This tutorial covers integration between Spark SQL and Cassandra and coding in Scala and Python with Spark SQL against table in Cassandra NoSQL database. Hope it provides values to those who are new on integrating Spark with Cassandra and want end to end working examples for quick start in their Spark/Cassandra applications.

Note: In the github README of Spark-Cassandra-Connector, there are some example of running Scala commands through interactive spark-shell and spark-sumbit, but with required options such as — packages and — conf.

What I have demonstrated differently in this writing is to have a…

Create 3 node Hadoop and Spark Cluster Monitored by SignalFx from Splunk

George Jen, Jen Tek LLC


This writing provides step by step instructions to create 3 node Hadoop, Hive and 3 node Spark cluster in both Standalone mode and managed by yarn from Hadoop as well as how to setup monitoring by SignalFx from Splunk using VirtualBox CentOS VMs on a single Windows 10 hosting PC. Hope it is helpful.

Objective and Task

I need to test a monitoring solution on Hadoop and Spark installation, to monitor key Hadoop cluster and Spark applications as well as OS (Linux) indicators. …

George Jen, Jen Tek LLC

I need to test our application on a database service that can be auto-scaled. Try snowflake, so sign up one.

My usage is to simply let Scala code to access to Snowflake, with only an assistance of a JDBC driver.

Therefore, below information is needed to be filled in:

Access URL

Snowflake JDBC driver

Snowflake JDBC driver can be downloaded at the Maven site, download the latest release

At the time of this writing, I download the following driver:


Access URL

To get URL information, I have to log into Snowflake with browser to harvest…

George Jen, Jen Tek LLC

This tutorial is for developers or learners who are new to Scala as a programming language and want to get familiar with processes of Scala development build and QA unit testing. If this is of interest to you, read on.

Pre-requisite are JDK, Scala compiler, sbt build tool and some basic computer science knowledge.

In this tutorial, it is assumed you have Java JDK has been installed. You will need to install JDK if it is not installed.

Additionally, sbt, the Scala build tool is needed. …

George Jen, Jen Tek LLC

If you need to write scala code to use Apache Spark Streaming to stream tweets from Twitter, you will need to import Twitter API library as below:

import org.apache.spark.streaming.twitter._

Since this library does not come with Apache Spark, you will need to build the its jar file and place the jar file in the classpath.

Here is the how to.

To start with, knowing your Spark and Scala version, which can be found by running $SPARK_HOME/bin/spark-shell

$SPARK_HOME/bin/spark-shellSetting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
20/06/12 14:06:22 WARN NativeCodeLoader…

George Jen

I am founder of Jen Tek LLC, a startup company in East Bay California developing AI powered, cloud based documentation/publishing software as a service.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store