Connect to Snowflake AWS Cloud Database in Scala using JDBC driver
George Jen, Jen Tek LLC
I need to test our application on a database service that can be auto-scaled. Try snowflake, so sign up one.
My usage is to simply let Scala code to access to Snowflake, with only an assistance of a JDBC driver.
Therefore, below information is needed to be filled in:
Access URL
Driver
CLASSPATH
username/password
Snowflake JDBC driver
Snowflake JDBC driver can be downloaded at the Maven site, download the latest release
https://repo1.maven.org/maven2/net/snowflake/snowflake-jdbc/
At the time of this writing, I download the following driver:
snowflake-jdbc-3.9.2.jar
Access URL
To get URL information, I have to log into Snowflake with browser to harvest the URL
In my case, the database access URL is, you need to replace the <hostname> with yours:
<hostname>.is-east-2.aws.snowflakecomputing.com
Note it down.
Driver Info
JDBC driver info is a fully qualified reverse domain name of the Java main class. This information is needed, but it is hidden. How to find out? You will need to expand the jar file:
snowflake-jdbc-3.9.2.jar
The jar file is like a tar ball, but it is a jar ball. You can expand it with jar command, same as tar:
jar -xvf snowflake-jdbc-3.9.2.jar
Then look for META-INF/MANIFEST.MF from JDBC root directory:
jentek@jenteknano:~/snowflake$ cat META-INF/MANIFEST.MF
Manifest-Version: 1.0
Implementation-Title: snowflake-jdbc
Implementation-Version: 3.9.2
Archiver-Version: Plexus Archiver
Built-By: jenkins
Implementation-Vendor-Id: net.snowflake
Created-By: Apache Maven 3.5.4
Build-Jdk: 1.8.0_191
Main-Class: net.snowflake.client.jdbc.SnowflakeDriver
The driver info is:
net.snowflake.client.jdbc.SnowflakeDriver
CLASSPATH
CLASSPATH in simple term, is the path where the jar file is located.
In my case, the JDBC jar file snowflake-jdbc-3.9.2.jar is in
/home/jentek/snowflake
Hence it is the CLASSPATH to be used.
username/password
That is the username and password that I signed up with Snowflake.
Before proceeding with Scala coding, I need to test the JDBC access to my Snowflake account. I happened to have SQLWorkbench, an open source JDBC client that is ready to roll.
While at it, create database called md_docs and a table markdown that is to store markdown documents. I do notice that the SQL statements used in MySQL will need to be modified before run them on Snowflake.
Access Snowflake from Scala Code in Jupyter-notebook
Now that JDBC connectivity with Snowflake appears to be working, then do it in Scala.
Jupyter notebook is a perfect platform to write scratch code to just to make sure command line to work before complete the application in an IDE. Jupyter notebook is not just for Python, you can run other languages on it as long as you have the kernel. I like to write scratch code on Jupyter notebook with Scala kernel, provided setting the CLASSPATH properly. This writing will show you how to set the CLASSPATH correctly in Jupyter.
Import Connection, DriverManager class from Java.sql library to deal with connection and query, and ammonite.ops library to deal with the filesystem from existing Scala projects or applications.
import java.sql.{Connection,DriverManager}
import ammonite.ops._
Specify CLASSPATH inside Jupyter notebook Scala kernel
val path = java.nio.file.FileSystems.getDefault().getPath("/home/jentek/snowflake/snowflake-jdbc-3.9.2.jar")
val x = ammonite.ops.Path(path)
interp.load.cp(x)
Specify URL, Driver, Username/password
val url="jdbc:snowflake://<redacted>.us-east-2.aws.snowflakecomputing.com"
val driver = "net.snowflake.client.jdbc.SnowflakeDriver"
val username = "<redacted>"
val password = "<redacted>"
Initialize, connect to Snowflake and query the table
var connection:Connection = null
var document_id=""
var markdown=""
try {
Class.forName(driver)
connection = DriverManager.getConnection(url, username,password)
val statement = connection.createStatement
val rs = statement.executeQuery("select * from md_docs.public.markdown")
while (rs.next) {
document_id = rs.getString("DOCUMENT_ID")
markdown = rs.getString("MARKDOWN")
println(s"document_id = $document_id")
println(s"markdown = $markdown")
}
}
catch {
case e: Exception => e.printStackTrace
}
connection.close
Here is the query result:
document_id = 1
markdown = [TOC]
document_id = 3
markdown = # Preface
Apache Spark is one of the greatest open source all in one enterprise big data/analytic engine. It combines distributed/clustered computing, high availability, disruption resilience and fault tolerance. It is in memory computing. It encapsulates sophisticated SQL query capability on structure data like relational database tables and non structure data like NoSQL key value pairs, along with robust streaming, rich machine learning and statistics features, paired with a graph computing engine for applications such as social network and internet advertisement revenue driven search engines.
Spark is powerful. However, it is non trivial, especially to those developers who are new to Spark, many of the API functions lack runnable, end to end, invoking example. Because of the nature of functional programming language, invoking many API functions successfully takes effort and learning curve.
The audiences of this book are developers who are new to Spark.
I write this book from my teaching notes on Apache Spark. This book intends to cover all Spark API functions/methods with example codes that are executable and that are working, coupled with concise input data and output results, with goal to provide quick references to developers who can extract section of working command lines with correct input arguments to the API calls to be used in their code, saving their time from trial and error attempts that I have come through personally when I am writing this book.
As of now, there are 10 projects in the area of Spark SQL, Spark Streaming, Spark Machine Learning, and Spark Graphx are included in this ebook as demos for conducting data science and data engineering project with Apache Spark.
As always, codes written by me used in this book are in my GitHub repo:
[https://github.com/geyungjen/jentekllc](https://github.com/geyungjen/jentekllc)
George Jen
Jen Tek LLC
_Draft, work in progress_
Summary
I modified section of my existing stock Scala code that was written to work with MySQL using JDBC driver to store markdown documents. Switch to a different database such as Snowflake requires only minimal change of the code, the coding pattern is exactly the same. One might as well just write the universal connection library for multiple database servers, as long as all needed JDBC drivers for these databases are in one place called CLASSPATH, so switch a database is like to flip a switch.
Hope this writing helpful, thank you for your time viewing.