Machine Learning Part 7 — Forward propagation in neural net in SQL

This is the seventh part of the ML series. For your convenience you can find other parts in the table of contents in Part 1 – Linear regression in MXNet

Today we are going to create a neural net and calculate forward propagation using PostgreSQL. Let’s go.

We start with definition of the network: we will have input layer, hidden layer, and output layer. Input layer will have 3 nodes, hidden layer will have 2, output layer will have 3. In input layer we don’t do any transformation on the input data, in hidden layer we use ReLU, in output layer we use linear activation function (so no transformation).

Let’s start with the following definitions:

DROP TABLE IF EXISTS inputs;
DROP TABLE IF EXISTS weights1;
DROP TABLE IF EXISTS weights2;
DROP TABLE IF EXISTS biases;

CREATE TABLE inputs (
  inputNode NUMERIC,
  inputValue NUMERIC
);

INSERT INTO inputs VALUES
    (1, 1)
   ,(2, 3)
   ,(3, 5)
;

CREATE TABLE weights1 (
  weight1InputNodeNumber NUMERIC,
  weight1OutputNodeNumber NUMERIC,
  weight1Value NUMERIC,
  weight1Bias NUMERIC
);

INSERT INTO weights1 VALUES
    (1, 1, 2, 1)
   ,(1, 2, 3, 1)
   ,(2, 1, 4, 2)
   ,(2, 2, 5, 2)
   ,(3, 1, 6, 3)
   ,(3, 2, 7, 3)
;

CREATE TABLE weights2 (
  weight2InputNodeNumber NUMERIC,
  weight2OutputNodeNumber NUMERIC,
  weight2Value NUMERIC,
  weight2Bias NUMERIC
);

INSERT INTO weights2 VALUES
    (1, 1, 1, 2)
   ,(1, 2, 2, 2)
   ,(1, 3, 3, 2)
   ,(2, 1, 4, 3)
   ,(2, 2, 5, 3)
   ,(2, 3, 6, 3)
;

DROP TABLE IF EXISTS inputs;

DROP TABLE IF EXISTS weights1;

DROP TABLE IF EXISTS weights2;

DROP TABLE IF EXISTS biases;

CREATE TABLE inputs (

inputNode NUMERIC,

inputValue NUMERIC

);

INSERT INTO inputs VALUES

(1, 1)

,(2, 3)

,(3, 5)

;

CREATE TABLE weights1 (

weight1InputNodeNumber NUMERIC,

weight1OutputNodeNumber NUMERIC,

weight1Value NUMERIC,

weight1Bias NUMERIC

);

INSERT INTO weights1 VALUES

(1, 1, 2, 1)

,(1, 2, 3, 1)

,(2, 1, 4, 2)

,(2, 2, 5, 2)

,(3, 1, 6, 3)

,(3, 2, 7, 3)

;

CREATE TABLE weights2 (

weight2InputNodeNumber NUMERIC,

weight2OutputNodeNumber NUMERIC,

weight2Value NUMERIC,

weight2Bias NUMERIC

);

INSERT INTO weights2 VALUES

(1, 1, 1, 2)

,(1, 2, 2, 2)

,(1, 3, 3, 2)

,(2, 1, 4, 3)

,(2, 2, 5, 3)

,(2, 3, 6, 3)

;

We define some input values, weights and biases. Values are completely made up and do not make a difference.

Before we write SQL code, let’s calculate result manually.

We have the following variables:

$\begin{gather*} input = \left[\begin{array}{c} 1 \\ 3 \\ 5 \end{array}\right] \\ W^1 = \left[\begin{array}{cc} 2 & 3 \\ 4 & 5 \\ 6 & 7 \end{array}\right] \\ b^1 = \left[\begin{array}{cc} 1 & 1 \\ 2 & 2 \\ 3 & 3 \end{array}\right] \\ W^2 = \left[\begin{array}{ccc} 1 & 2 & 3 \\ 4 & 5 & 6 \end{array}\right] \\ b^2 = \left[\begin{array}{ccc} 2 & 2 & 2 \\ 3 & 3 & 3 \end{array}\right] \\ \end{gather*}$

Now, let’s calculate input for hidden layer:

$\begin{gather*} h^{in} = \left[\begin{array}{c} W^1_{1, 1} \cdot input_1 + b^1_{1, 1} + W^1_{2, 1} \cdot input_2 + b^1_{2, 1} + W^1_{3, 1} \cdot input_3 + b^1_{3, 1} \\ W^1_{1, 2} \cdot input_1 + b^1_{1, 2} + W^1_{2, 2} \cdot input_2 + b^1_{2, 2} + W^1_{3, 2} \cdot input_3 + b^1_{3, 2} \end{array}\right] \end{gather*}$

Now, we use ReLU activation function for hidden layer:

$\begin{gather*} h^{out} = \left[\begin{array}{c} \max(h^{in}_1, 0) \\ \max(h^{in}_2, 0) \end{array}\right] \end{gather*}$

We carry on with calculating input for output layer:

$\begin{gather*} y^{in} = \left[\begin{array}{c} W^2_{1, 1} \cdot h^{out}_1 + b^2_{1, 1} + W^2_{2, 1} \cdot h^{out}_2 + b^2_{2, 1} \\ W^2_{1, 2} \cdot h^{out}_1 + b^2_{1, 2} + W^2_{2, 2} \cdot h^{out}_2 + b^2_{2, 2} \\ W^2_{1, 3} \cdot h^{out}_1 + b^2_{1, 3} + W^2_{2, 3} \cdot h^{out}_2 + b^2_{2, 3} \end{array}\right] \end{gather*}$

Activation function for output layer is linear, so it is easy now:

$\begin{gather*} y^{out} = y^{in} \end{gather*}$

We will calculate errors next time.

Now, let’s calculate the result:

WITH RECURSIVE currentPhase AS(
	SELECT CAST(0 AS NUMERIC) AS phase
),
oneRow AS(
	SELECT CAST(NULL AS NUMERIC) AS rowValue
),
solution AS (
	SELECT I.*, O1.rowValue AS inputLayerOutput, W1.*, I2.rowValue AS hiddenLayerInput, O2.rowValue AS hiddenLayerOutput, W2.*, I3.rowValue AS outputLayerInput, O3.rowValue AS outputLayerOutput, P.*
	FROM inputs AS I
	CROSS JOIN oneRow AS O1
	JOIN weights1 AS W1 ON W1.weight1InputNodeNumber = I.inputNode
	CROSS JOIN oneRow AS I2
	CROSS JOIN oneRow AS O2
	JOIN weights2 AS W2 ON W2.weight2InputNodeNumber = W1.weight1OutputNodeNumber
	CROSS JOIN oneRow AS I3
	CROSS JOIN oneRow AS O3
	CROSS JOIN currentPhase AS P

	UNION ALL
	
    SELECT
		inputNode,
		inputValue,

		CASE
			WHEN phase = 0 THEN inputValue
			ELSE inputLayerOutput
		END AS inputLayerOutput,

		weight1InputNodeNumber,
		weight1OutputNodeNumber,
		weight1Value,
		weight1Bias,

		CASE
			WHEN phase = 1 THEN SUM(weight1Value * inputLayerOutput + weight1Bias) OVER (PARTITION BY weight1OutputNodeNumber, phase) / 3
			ELSE hiddenLayerInput
		END AS hiddenLayerInput,

		CASE
			WHEN phase = 2 THEN CASE WHEN hiddenLayerInput > 0 THEN hiddenLayerInput ELSE 0 END
			ELSE hiddenLayerOutput
		END AS hiddenLayerOutput,

		weight2InputNodeNumber,
		weight2OutputNodeNumber,
		weight2Value,
		weight2Bias,

		CASE
			WHEN phase = 3 THEN SUM(weight2Value * hiddenLayerOutput + weight2Bias) OVER (PARTITION BY weight2OutputNodeNumber, phase) / 3
			ELSE outputLayerInput
		END AS outputLayerInput,

		CASE
			WHEN phase = 4 THEN outputLayerInput
			ELSE outputLayerOutput
		END AS outputLayerOutput,

		phase + 1 AS phase

	FROM solution
	WHERE phase <= 4
)
SELECT DISTINCT weight2OutputNodeNumber, outputLayerOutput
FROM solution WHERE phase = 5

WITH RECURSIVE currentPhase AS(

SELECT CAST(0 AS NUMERIC) AS phase

oneRow AS(

SELECT CAST(NULL AS NUMERIC) AS rowValue

solution AS (

SELECT I.*, O1.rowValue AS inputLayerOutput, W1.*, I2.rowValue AS hiddenLayerInput, O2.rowValue AS hiddenLayerOutput, W2.*, I3.rowValue AS outputLayerInput, O3.rowValue AS outputLayerOutput, P.*

FROM inputs AS I

CROSS JOIN oneRow AS O1

JOIN weights1 AS W1 ON W1.weight1InputNodeNumber = I.inputNode

CROSS JOIN oneRow AS I2

CROSS JOIN oneRow AS O2

JOIN weights2 AS W2 ON W2.weight2InputNodeNumber = W1.weight1OutputNodeNumber

CROSS JOIN oneRow AS I3

CROSS JOIN oneRow AS O3

CROSS JOIN currentPhase AS P

UNION ALL

SELECT

inputNode,

inputValue,

CASE

WHEN phase = 0 THEN inputValue

ELSE inputLayerOutput

END AS inputLayerOutput,

weight1InputNodeNumber,

weight1OutputNodeNumber,

weight1Value,

weight1Bias,

CASE

WHEN phase = 1 THEN SUM(weight1Value * inputLayerOutput + weight1Bias) OVER (PARTITION BY weight1OutputNodeNumber, phase) / 3

ELSE hiddenLayerInput

END AS hiddenLayerInput,

CASE

WHEN phase = 2 THEN CASE WHEN hiddenLayerInput > 0 THEN hiddenLayerInput ELSE 0 END

ELSE hiddenLayerOutput

END AS hiddenLayerOutput,

weight2InputNodeNumber,

weight2OutputNodeNumber,

weight2Value,

weight2Bias,

CASE

WHEN phase = 3 THEN SUM(weight2Value * hiddenLayerOutput + weight2Bias) OVER (PARTITION BY weight2OutputNodeNumber, phase) / 3

ELSE outputLayerInput

END AS outputLayerInput,

CASE

WHEN phase = 4 THEN outputLayerInput

ELSE outputLayerOutput

END AS outputLayerOutput,

phase + 1 AS phase

FROM solution

WHERE phase <= 4

)

SELECT DISTINCT weight2OutputNodeNumber, outputLayerOutput

FROM solution WHERE phase = 5

This is actually very easy. We divide the process into multiple phases. Each row of CTE represents one complete path from some input node to some output node. Initially row carries some metadata and input value, in each phase we fill some next value using different case expressions.

In phase 0 we get the input and transform it into output, since input layer has no logic, we just copy the value.
In phase 1 we calculate inputs for next layer by multiplying weights and values.
In phase 2 we activate hidden layer. Since we use ReLU, we perform a very simple comparison.
In phase 3 we once again use weights and values to calculate input for next layer, this time we use different weights.
In phase 4 we activate output layer, which just copies values (since we use a linear activation function).

So in our query we start by defining a schema. We simply join all tables and cross join dummy table with one row which we use to define additional column. We fill these columns later throughout the process.

In recursive part of CTE we simply either rewrite values or do some logic depending on the phase number.

You can see results here.

Next time we will see how to backpropagate errors.