This is the sixth part of the ML series. For your convenience you can find other parts in the table of contents in Part 1 – Linear regression in MXNet
Today we are going to implement a matrix multiplication in Redshift. Let’s go.
First, let’s see what we want to calculate:
Nothing fancy. We would like our algorithm to be extensible to any sizes and non-uniform matrices as well.
Let’s start with matrix representation:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
DROP TABLE IF EXISTS matrix1; CREATE TEMP TABLE matrix1 ( rowNumber INT, columnNumber INT, value INT ); DROP TABLE IF EXISTS matrix2; CREATE TEMP TABLE matrix2 ( rowNumber INT, columnNumber INT, value INT ); INSERT INTO matrix1 VALUES (1, 1, 2) ,(1, 2, 3) ,(2, 1, 4) ,(2, 2, 5) ; INSERT INTO matrix2 VALUES (1, 1, 5) ,(1, 2, 3) ,(2, 1, 2) ,(2, 2, 4) ; |
We store the matrices as a rows where each row represents one value for given row and column. Rows and columns are one-based.
First, we need to calculate size of the result:
1 2 3 4 5 6 7 8 9 |
WITH maxWidth AS( SELECT MAX(columnNumber) AS width FROM matrix2 ), maxHeight AS ( SELECT MAX(rowNumber) AS height FROM matrix1 ), resultDimensions AS ( SELECT width, height FROM maxWidth CROSS JOIN maxHeight ), |
So we just get the maximum width and maximum height from the respective matrices. Now, we want to generate all the cells we need to fill:
1 2 3 4 5 6 7 8 9 |
rowNums AS ( SELECT (row_number() OVER (ORDER BY 1)) AS rowNumber FROM matrix1 WHERE rowNumber <= (SELECT MAX(height) FROM resultDimensions) ), columnNums AS ( SELECT (row_number() OVER (ORDER BY 1)) AS columnNumber FROM matrix2 WHERE columnNumber <= (SELECT width FROM resultDimensions) ), positions AS ( SELECT rowNumber, columnNumber FROM rowNums CROSS JOIN columnNums ), |
So we basically do the Cartesian product and we are done. Now, we would like to get correct pairs for each cell:
1 2 3 4 5 6 |
pairsForPositions AS ( SELECT P.rowNumber, P.columnNumber, M1.value AS M1, M2.value AS M2 FROM positions AS P JOIN matrix1 AS M1 ON M1.rowNumber = P.rowNumber JOIN matrix2 AS M2 ON M2.columnNumber = P.columnNumber AND M2.rowNumber = M1.columnNumber ), |
This is what we get for our sample matrices:
1 2 3 4 5 6 7 8 9 |
row column m1 m2 1 1 2 5 1 1 3 2 1 2 2 3 1 2 3 4 2 1 4 5 2 1 5 2 2 2 4 3 2 2 5 4 |
Looks good. Now we just need to aggregate the pairs:
1 2 3 4 5 6 |
results AS ( SELECT rowNumber, columnNumber, SUM(M1 * M2) AS value FROM pairsForPositions GROUP BY rowNumber, columnNumber ) SELECT * FROM results ORDER BY rowNumber, columnNumber |
And we are done. You can see the code here.