Machine Learning Part 8 — Backpropagation in neural net in SQL

This is the eighth part of the ML series. For your convenience you can find other parts in the table of contents in Part 1 – Linear regression in MXNet

Last time we saw forward propagation in neural net. Today we are going to extend the process to backpropagate the errors. Let’s begin.

We need to add some more definitions to calculate output:

CREATE TABLE outputs (
  outputNode NUMERIC,
  outputValue NUMERIC
);

INSERT INTO outputs VALUES
    (1, 290)
   ,(2, 399)
   ,(3, 505)
;

CREATE TABLE outputs (

outputNode NUMERIC,

outputValue NUMERIC

);

INSERT INTO outputs VALUES

(1, 290)

,(2, 399)

,(3, 505)

;

Before we see some SQL code, let’s do some math. We had three layers (input, hidden, output), in input and output layers we used linear activation function. Hidden layer used ReLU.

We start with calculating loss function. We use normal squared error:

$\begin{gather*} Loss = \left[\begin{array}{c} \frac{\left(y^{out}_1 - target_1\right)^2 }{ 2 } \\ \frac{\left(y^{out}_2 - target_2\right)^2 }{ 2 } \end{array}\right] \end{gather*}$

Now let’s calculate partial derivatives to update weights between hidden layer and output layer:

$\begin{gather*} \left[\begin{array}{ccc} \frac{\partial Loss}{\partial W^2_{1,1}} & \frac{\partial Loss}{\partial W^2_{1,2}} & \frac{\partial Loss}{\partial W^2_{1,3}} \\ \frac{\partial Loss}{\partial W^2_{2,1}} & \frac{\partial Loss}{\partial W^2_{2,2}} & \frac{\partial Loss}{\partial W^2_{2,3}} \end{array}\right] = \left[\begin{array}{ccc} \frac{\partial Loss}{\partial y^{out}_1 } \frac{\partial y^{out}_1 }{\partial y^{in}_1} \frac{\partial y^{in}_1}{\partial W^2_{1,1}} & \frac{\partial Loss}{\partial y^{out}_2 } \frac{\partial y^{out}_2 }{\partial y^{in}_2} \frac{\partial y^{in}_2}{\partial W^2_{1,2}} & \frac{\partial Loss}{\partial y^{out}_3 } \frac{\partial y^{out}_3 }{\partial y^{in}_3} \frac{\partial y^{in}_3}{\partial W^2_{1,3}} \\ \frac{\partial Loss}{\partial y^{out}_1 } \frac{\partial y^{out}_1 }{\partial y^{in}_1} \frac{\partial y^{in}_1}{\partial W^2_{2,1}} & \frac{\partial Loss}{\partial y^{out}_2 } \frac{\partial y^{out}_2 }{\partial y^{in}_2} \frac{\partial y^{in}_2}{\partial W^2_{2,2}} & \frac{\partial Loss}{\partial y^{out}_3 } \frac{\partial y^{out}_3 }{\partial y^{in}_3} \frac{\partial y^{in}_3}{\partial W^2_{2,3}} \end{array}\right] =\\ \left[\begin{array}{ccc} (y^{out}_1 - target_1) \cdot 1 \cdot h^{out}_1 & (y^{out}_2 - target_2) \cdot 1 \cdot h^{out}_1 & (y^{out}_3 - target_3) \cdot 1 \cdot h^{out}_1 \\ (y^{out}_1 - target_1) \cdot 1 \cdot h^{out}_2 & (y^{out}_2 - target_2) \cdot 1 \cdot h^{out}_2 & (y^{out}_3 - target_3) \cdot 1 \cdot h^{out}_2 \\ \end{array}\right] \end{gather*}$

Now, the same for biases:

$\begin{gather*} \left[\begin{array}{ccc} \frac{\partial Loss}{\partial b^2_{1,1}} & \frac{\partial Loss}{\partial b^2_{1,2}} & \frac{\partial Loss}{\partial b^2_{1,3}} \\ \frac{\partial Loss}{\partial b^2_{2,1}} & \frac{\partial Loss}{\partial b^2_{2,2}} & \frac{\partial Loss}{\partial b^2_{2,3}} \end{array}\right] = \left[\begin{array}{ccc} \frac{\partial Loss}{\partial y^{out}_1 } \frac{\partial y^{out}_1 }{\partial y^{in}_1} \frac{\partial y^{in}_1}{\partial b^2_{1,1}} & \frac{\partial Loss}{\partial y^{out}_2 } \frac{\partial y^{out}_2 }{\partial y^{in}_2} \frac{\partial y^{in}_2}{\partial b^2_{1,2}} & \frac{\partial Loss}{\partial y^{out}_3 } \frac{\partial y^{out}_3 }{\partial y^{in}_3} \frac{\partial y^{in}_3}{\partial b^2_{1,3}} \\ \frac{\partial Loss}{\partial y^{out}_1 } \frac{\partial y^{out}_1 }{\partial y^{in}_1} \frac{\partial y^{in}_1}{\partial b^2_{2,1}} & \frac{\partial Loss}{\partial y^{out}_2 } \frac{\partial y^{out}_2 }{\partial y^{in}_2} \frac{\partial y^{in}_2}{\partial b^2_{2,2}} & \frac{\partial Loss}{\partial y^{out}_3 } \frac{\partial y^{out}_3 }{\partial y^{in}_3} \frac{\partial y^{in}_3}{\partial b^2_{2,3}} \end{array}\right] =\\ \left[\begin{array}{ccc} (y^{out}_1 - target_1) \cdot 1 \cdot 1 & (y^{out}_2 - target_2) \cdot 1 \cdot 1 & (y^{out}_3 - target_3) \cdot 1 \cdot 1 \\ (y^{out}_1 - target_1) \cdot 1 \cdot 1 & (y^{out}_2 - target_2) \cdot 1 \cdot 1 & (y^{out}_3 - target_3) \cdot 1 \cdot 1 \\ \end{array}\right] \end{gather*}$

That was easy. Now we use learning rate equal to $0.1$ and we can update both weights and biases between hidden layer and output layer.

Similar things go for other updates. If you are lost, you can find great explanation here.

Let’s now see the code:

WITH RECURSIVE currentPhase AS(
	SELECT CAST(0 AS NUMERIC) AS phase
),
oneRow AS(
	SELECT CAST(NULL AS NUMERIC) AS rowValue
),
solution AS (
	SELECT I.*, O1.rowValue AS inputLayerOutput, W1.*, I2.rowValue AS hiddenLayerInput, O2.rowValue AS hiddenLayerOutput, W2.*, I3.rowValue AS outputLayerInput, O3.rowValue AS outputLayerOutput, O.*, E.rowValue AS errorValue, P.*
	FROM inputs AS I
	CROSS JOIN oneRow AS O1
	JOIN weights1 AS W1 ON W1.weight1InputNodeNumber = I.inputNode
	CROSS JOIN oneRow AS I2
	CROSS JOIN oneRow AS O2
	JOIN weights2 AS W2 ON W2.weight2InputNodeNumber = W1.weight1OutputNodeNumber
	CROSS JOIN oneRow AS I3
	CROSS JOIN oneRow AS O3
	JOIN outputs AS O ON O.outputNode = W2.weight2OutputNodeNumber
	CROSS JOIN oneRow AS E
	CROSS JOIN currentPhase AS P

	UNION ALL
	
    SELECT
		inputNode,
		inputValue,

		CASE
			WHEN phase = 0 THEN inputValue
			ELSE inputLayerOutput
		END AS inputLayerOutput,

		weight1InputNodeNumber,
		weight1OutputNodeNumber,
		
		CASE
			WHEN phase = 6 THEN weight1Value - 0.1 * (SUM(outputLayerOutput - outputValue) OVER (PARTITION BY weight1InputNodeNumber, weight1OutputNodeNumber))  * 1 * weight2Value * (CASE WHEN hiddenLayerInput > 0 THEN 1 ELSE 0 END) * inputLayerOutput
			ELSE weight1Value
		END AS weight1Value,
		
		CASE
			WHEN phase = 6 THEN weight1Value - 0.1 * (SUM(outputLayerOutput - outputValue) OVER (PARTITION BY weight1InputNodeNumber, weight1OutputNodeNumber)) * 1 * weight2Value * (CASE WHEN hiddenLayerInput > 0 THEN 1 ELSE 0 END) * 1
			ELSE weight1Bias
		END weight1Bias,

		CASE
			WHEN phase = 1 THEN SUM(weight1Value * inputLayerOutput + weight1Bias) OVER (PARTITION BY weight1OutputNodeNumber, phase) / 3
			ELSE hiddenLayerInput
		END AS hiddenLayerInput,

		CASE
			WHEN phase = 2 THEN CASE WHEN hiddenLayerInput > 0 THEN hiddenLayerInput ELSE 0 END
			ELSE hiddenLayerOutput
		END AS hiddenLayerOutput,

		weight2InputNodeNumber,
		weight2OutputNodeNumber,
		
		CASE
			WHEN phase = 6 THEN weight2Value - 0.1 * (outputLayerOutput - outputValue) * 1 * hiddenLayerOutput
			ELSE weight2Value
		END AS weight2Value,
		
		CASE
			WHEN phase = 6 THEN weight2Value - 0.1 * (outputLayerOutput - outputValue) * 1 * 1
			ELSE weight2Bias
		END ASweight2Bias,

		CASE
			WHEN phase = 3 THEN SUM(weight2Value * hiddenLayerOutput + weight2Bias) OVER (PARTITION BY weight2OutputNodeNumber, phase) / 3
			ELSE outputLayerInput
		END AS outputLayerInput,

		CASE
			WHEN phase = 4 THEN outputLayerInput
			ELSE outputLayerOutput
		END AS outputLayerOutput,
		
		outputNode,
		outputValue,
		
		CASE
			WHEN phase = 5 THEN (outputLayerOutput - outputValue) * (outputLayerOutput - outputValue) / 2
			ELSE errorValue
		END AS errorValue,

		phase + 1 AS phase

	FROM solution
	WHERE phase <= 6
)
SELECT DISTINCT *
FROM solution WHERE phase = 7
ORDER BY weight1InputNodeNumber, weight1OutputNodeNumber, weight2OutputNodeNumber

WITH RECURSIVE currentPhase AS(

SELECT CAST(0 AS NUMERIC) AS phase

oneRow AS(

SELECT CAST(NULL AS NUMERIC) AS rowValue

solution AS (

SELECT I.*, O1.rowValue AS inputLayerOutput, W1.*, I2.rowValue AS hiddenLayerInput, O2.rowValue AS hiddenLayerOutput, W2.*, I3.rowValue AS outputLayerInput, O3.rowValue AS outputLayerOutput, O.*, E.rowValue AS errorValue, P.*

FROM inputs AS I

CROSS JOIN oneRow AS O1

JOIN weights1 AS W1 ON W1.weight1InputNodeNumber = I.inputNode

CROSS JOIN oneRow AS I2

CROSS JOIN oneRow AS O2

JOIN weights2 AS W2 ON W2.weight2InputNodeNumber = W1.weight1OutputNodeNumber

CROSS JOIN oneRow AS I3

CROSS JOIN oneRow AS O3

JOIN outputs AS O ON O.outputNode = W2.weight2OutputNodeNumber

CROSS JOIN oneRow AS E

CROSS JOIN currentPhase AS P

UNION ALL

SELECT

inputNode,

inputValue,

CASE

WHEN phase = 0 THEN inputValue

ELSE inputLayerOutput

END AS inputLayerOutput,

weight1InputNodeNumber,

weight1OutputNodeNumber,

CASE

WHEN phase = 6 THEN weight1Value - 0.1 * (SUM(outputLayerOutput - outputValue) OVER (PARTITION BY weight1InputNodeNumber, weight1OutputNodeNumber)) * 1 * weight2Value * (CASE WHEN hiddenLayerInput > 0 THEN 1 ELSE 0 END) * inputLayerOutput

ELSE weight1Value

END AS weight1Value,

CASE

ELSE weight1Bias

END weight1Bias,

CASE

WHEN phase = 1 THEN SUM(weight1Value * inputLayerOutput + weight1Bias) OVER (PARTITION BY weight1OutputNodeNumber, phase) / 3

ELSE hiddenLayerInput

END AS hiddenLayerInput,

CASE

WHEN phase = 2 THEN CASE WHEN hiddenLayerInput > 0 THEN hiddenLayerInput ELSE 0 END

ELSE hiddenLayerOutput

END AS hiddenLayerOutput,

weight2InputNodeNumber,

weight2OutputNodeNumber,

CASE

WHEN phase = 6 THEN weight2Value - 0.1 * (outputLayerOutput - outputValue) * 1 * hiddenLayerOutput

ELSE weight2Value

END AS weight2Value,

CASE

WHEN phase = 6 THEN weight2Value - 0.1 * (outputLayerOutput - outputValue) * 1 * 1

ELSE weight2Bias

END ASweight2Bias,

CASE

WHEN phase = 3 THEN SUM(weight2Value * hiddenLayerOutput + weight2Bias) OVER (PARTITION BY weight2OutputNodeNumber, phase) / 3

ELSE outputLayerInput

END AS outputLayerInput,

CASE

WHEN phase = 4 THEN outputLayerInput

ELSE outputLayerOutput

END AS outputLayerOutput,

outputNode,

outputValue,

CASE

WHEN phase = 5 THEN (outputLayerOutput - outputValue) * (outputLayerOutput - outputValue) / 2

ELSE errorValue

END AS errorValue,

phase + 1 AS phase

FROM solution

WHERE phase <= 6

)

SELECT DISTINCT *

FROM solution WHERE phase = 7

ORDER BY weight1InputNodeNumber, weight1OutputNodeNumber, weight2OutputNodeNumber

It is very similar to the solution from previous post. This time in phase 5 we calculate error, in phase 6 we update weights and biases. You can find results here.