Category Archives

Archive of posts published in the category: SQL

Data encryption in s3 in Spark in EMR with multiple encryption schemes

Spark supports multiple encryption schemes. You can use client side encryption, server side encryption, etc. What wasn’t working for me for a long time is reading encrypted data and writing as a plain text. Before reading I was configuring encryption and this was…

Running Anaconda with DGL and mxnet on CUDA GPU in Spark running in EMR

Today I’m going to share my configuration for running custom Anaconda Python with DGL (Deep Graph Library) and mxnet library, with GPU support via CUDA, running in Spark hosted in EMR. Actually, I have Redshift configuration as well, with support for gensim, tensorflow,…

Running any query in Redshift or JDBC from Spark in EMR

Last time we saw how to connect to Redshift from Spark running in EMR. Provided solution was nice but allowed for reading data only. Sometimes we might want to run any DDL or DML query, not only simple read statements. To do that,…

Connecting to Redshift from Spark running in EMR

Today I’ll share my configuration for Spark running in EMR to connect to Redshift cluster. First, I assume the cluster is accessible (so configure virtual subnet, allowed IPs and all network stuff before running this). I’m using Zeppelin so I’ll show two interpreters…

Aborted transactions in Redshift due to Serializable isolation level

I was working with seemingly simple load job in Redshift. I was generating some data and wanted to store the output in separate table, just to persist it for debugging purposes. However, since the job could be reran if needed, I wanted to…

Machine Learning Part 8 — Backpropagation in neural net in SQL

This is the eighth part of the ML series. For your convenience you can find other parts in the table of contents in Part 1 – Linear regression in MXNet Last time we saw forward propagation in neural net. Today we are going…

Machine Learning Part 7 — Forward propagation in neural net in SQL

This is the seventh part of the ML series. For your convenience you can find other parts in the table of contents in Part 1 – Linear regression in MXNet Today we are going to create a neural net and calculate forward propagation…

Windowing functions in recursive CTE

Today we will see an interesting case of incompatibility between MS SQL Server 2017 and PostgreSQL 9.6 (and different versions as well). Let’s start with this code:

We emulate a recursive CTE. We have two columns in source dataset, we want to…

Machine Learning Part 6 — Matrix multiplication in SQL

This is the sixth part of the ML series. For your convenience you can find other parts in the table of contents in Part 1 – Linear regression in MXNet Today we are going to implement a matrix multiplication in Redshift. Let’s go.…

Machine Learning Part 4 — Linear regression in T-SQL

This is the fourth part of the ML series. For your convenience you can find other parts in the table of contents in Part 1 – Linear regression in MXNet This time we are going to implement linear regression as a function. This…

Machine Learning Part 3 — Linear regression in SQL revisited

This is the third part of the ML series. For your convenience you can find other parts in the table of contents in Part 1 – Linear regression in MXNet Last time we saw how to calculate linear regression for the Iris dataset.…

Machine Learning Part 2 — Linear regression in SQL

This is the second part of the ML series. For your convenience you can find other parts in the table of contents in Part 1 – Linear regression in MXNet Image that you have only a data warehouse with SQL capabilities to train…

Executing SQL query inside Excel spreadsheet

Did you know that you can execute almost any SQL query directly in your Excel spreadsheet? This can be very useful for executing queries with GROUP BY and aggregates since they cannot be generated that easily. Let’s begin. I will use Excel 2013…

SQLxD Part 23 — Query parser tests

This is the twenty third part of the SQLxD series. For your convenience you can find other parts in the table of contents in Part 1 – XML Transformation Here go the tests for query parser:

If you followed this series then…

SQLxD Part 22 — Query parser

This is the twenty second part of the SQLxD series. For your convenience you can find other parts in the table of contents in Part 1 – XML Transformation For parsing queries we use IronPython. Let’s begin with C# code for executing PLY…

SQLxD Part 21 — SELECT

This is the twenty first part of the SQLxD series. For your convenience you can find other parts in the table of contents in Part 1 – XML Transformation We have all things to transform rows and columns. Now it is time to…

SQLxD Part 20 — Transformers

This is the twentieth part of the SQLxD series. For your convenience you can find other parts in the table of contents in Part 1 – XML Transformation We can extract nodes, transform them into rows, filter, join, group, order, and transform them.…

SQLxD Part 19 — Expressions tests

This is the nineteenth part of the SQLxD series. For your convenience you can find other parts in the table of contents in Part 1 – XML Transformation Last time we implemented expressions, it is high time to test them:

SQLxD Part 18 — Expressions

This is the eighteenth part of the SQLxD series. For your convenience you can find other parts in the table of contents in Part 1 – XML Transformation Last time we saw tests for aggregates. Today we are going to implement expressions. We…

SQLxD Part 17 — Tests for aggregates

This is the seventeenth part of the SQLxD series. For your convenience you can find other parts in the table of contents in Part 1 – XML Transformation Last time we implemented aggregates, today we test them. Here are the snippets: