Connect to Snowflake AWS Cloud Database in Scala using JDBC driver

George Jen
5 min readJul 13, 2020

--

George Jen, Jen Tek LLC

I need to test our application on a database service that can be auto-scaled. Try snowflake, so sign up one.

My usage is to simply let Scala code to access to Snowflake, with only an assistance of a JDBC driver.

Therefore, below information is needed to be filled in:

Access URL
Driver
CLASSPATH
username/password

Snowflake JDBC driver

Snowflake JDBC driver can be downloaded at the Maven site, download the latest release

https://repo1.maven.org/maven2/net/snowflake/snowflake-jdbc/

At the time of this writing, I download the following driver:

snowflake-jdbc-3.9.2.jar

Access URL

To get URL information, I have to log into Snowflake with browser to harvest the URL

In my case, the database access URL is, you need to replace the <hostname> with yours:

<hostname>.is-east-2.aws.snowflakecomputing.com

Note it down.

Driver Info

JDBC driver info is a fully qualified reverse domain name of the Java main class. This information is needed, but it is hidden. How to find out? You will need to expand the jar file:

snowflake-jdbc-3.9.2.jar

The jar file is like a tar ball, but it is a jar ball. You can expand it with jar command, same as tar:

jar -xvf snowflake-jdbc-3.9.2.jar

Then look for META-INF/MANIFEST.MF from JDBC root directory:

jentek@jenteknano:~/snowflake$ cat META-INF/MANIFEST.MF
Manifest-Version: 1.0
Implementation-Title: snowflake-jdbc
Implementation-Version: 3.9.2
Archiver-Version: Plexus Archiver
Built-By: jenkins
Implementation-Vendor-Id: net.snowflake
Created-By: Apache Maven 3.5.4
Build-Jdk: 1.8.0_191
Main-Class: net.snowflake.client.jdbc.SnowflakeDriver

The driver info is:

net.snowflake.client.jdbc.SnowflakeDriver

CLASSPATH

CLASSPATH in simple term, is the path where the jar file is located.

In my case, the JDBC jar file snowflake-jdbc-3.9.2.jar is in

/home/jentek/snowflake

Hence it is the CLASSPATH to be used.

username/password

That is the username and password that I signed up with Snowflake.

Before proceeding with Scala coding, I need to test the JDBC access to my Snowflake account. I happened to have SQLWorkbench, an open source JDBC client that is ready to roll.

While at it, create database called md_docs and a table markdown that is to store markdown documents. I do notice that the SQL statements used in MySQL will need to be modified before run them on Snowflake.

Access Snowflake from Scala Code in Jupyter-notebook

Now that JDBC connectivity with Snowflake appears to be working, then do it in Scala.

Jupyter notebook is a perfect platform to write scratch code to just to make sure command line to work before complete the application in an IDE. Jupyter notebook is not just for Python, you can run other languages on it as long as you have the kernel. I like to write scratch code on Jupyter notebook with Scala kernel, provided setting the CLASSPATH properly. This writing will show you how to set the CLASSPATH correctly in Jupyter.

Import Connection, DriverManager class from Java.sql library to deal with connection and query, and ammonite.ops library to deal with the filesystem from existing Scala projects or applications.

import java.sql.{Connection,DriverManager}
import ammonite.ops._

Specify CLASSPATH inside Jupyter notebook Scala kernel

val path = java.nio.file.FileSystems.getDefault().getPath("/home/jentek/snowflake/snowflake-jdbc-3.9.2.jar")
val x = ammonite.ops.Path(path)
interp.load.cp(x)

Specify URL, Driver, Username/password

val url="jdbc:snowflake://<redacted>.us-east-2.aws.snowflakecomputing.com"
val driver = "net.snowflake.client.jdbc.SnowflakeDriver"
val username = "<redacted>"
val password = "<redacted>"

Initialize, connect to Snowflake and query the table

var connection:Connection = null
var document_id=""
var markdown=""
try {
Class.forName(driver)
connection = DriverManager.getConnection(url, username,password)
val statement = connection.createStatement
val rs = statement.executeQuery("select * from md_docs.public.markdown")
while (rs.next) {
document_id = rs.getString("DOCUMENT_ID")
markdown = rs.getString("MARKDOWN")
println(s"document_id = $document_id")
println(s"markdown = $markdown")
}
}
catch {
case e: Exception => e.printStackTrace
}
connection.close

Here is the query result:

document_id = 1
markdown = [TOC]


document_id = 3
markdown = # Preface

Apache Spark is one of the greatest open source all in one enterprise big data/analytic engine. It combines distributed/clustered computing, high availability, disruption resilience and fault tolerance. It is in memory computing. It encapsulates sophisticated SQL query capability on structure data like relational database tables and non structure data like NoSQL key value pairs, along with robust streaming, rich machine learning and statistics features, paired with a graph computing engine for applications such as social network and internet advertisement revenue driven search engines.

Spark is powerful. However, it is non trivial, especially to those developers who are new to Spark, many of the API functions lack runnable, end to end, invoking example. Because of the nature of functional programming language, invoking many API functions successfully takes effort and learning curve.

The audiences of this book are developers who are new to Spark.

I write this book from my teaching notes on Apache Spark. This book intends to cover all Spark API functions/methods with example codes that are executable and that are working, coupled with concise input data and output results, with goal to provide quick references to developers who can extract section of working command lines with correct input arguments to the API calls to be used in their code, saving their time from trial and error attempts that I have come through personally when I am writing this book.

As of now, there are 10 projects in the area of Spark SQL, Spark Streaming, Spark Machine Learning, and Spark Graphx are included in this ebook as demos for conducting data science and data engineering project with Apache Spark.

As always, codes written by me used in this book are in my GitHub repo:

[https://github.com/geyungjen/jentekllc](https://github.com/geyungjen/jentekllc)



George Jen

Jen Tek LLC

_Draft, work in progress_

Summary

I modified section of my existing stock Scala code that was written to work with MySQL using JDBC driver to store markdown documents. Switch to a different database such as Snowflake requires only minimal change of the code, the coding pattern is exactly the same. One might as well just write the universal connection library for multiple database servers, as long as all needed JDBC drivers for these databases are in one place called CLASSPATH, so switch a database is like to flip a switch.

Hope this writing helpful, thank you for your time viewing.

--

--

George Jen

I am founder of Jen Tek LLC, a startup company in East Bay California developing AI powered, cloud based documentation/publishing software as a service.