This is the eighth part of the ML series. For your convenience you can find other parts in the table of contents in Part 1 – Linear regression in MXNet

Last time we saw forward propagation in neural net. Today we are going to extend the process to backpropagate the errors. Let’s begin.

We need to add some more definitions to calculate output:

Before we see some SQL code, let’s do some math. We had three layers (input, hidden, output), in input and output layers we used linear activation function. Hidden layer used ReLU.

We start with calculating loss function. We use normal squared error:

    \begin{gather*} Loss = \left[\begin{array}{c} \frac{\left(y^{out}_1 - target_1\right)^2 }{ 2 } \\ \frac{\left(y^{out}_2 - target_2\right)^2 }{ 2 } \end{array}\right] \end{gather*}

Now let’s calculate partial derivatives to update weights between hidden layer and output layer:

    \begin{gather*} \left[\begin{array}{ccc} \frac{\partial Loss}{\partial W^2_{1,1}} & \frac{\partial Loss}{\partial W^2_{1,2}} & \frac{\partial Loss}{\partial W^2_{1,3}} \\ \frac{\partial Loss}{\partial W^2_{2,1}} & \frac{\partial Loss}{\partial W^2_{2,2}} & \frac{\partial Loss}{\partial W^2_{2,3}} \end{array}\right] =  \left[\begin{array}{ccc}  \frac{\partial Loss}{\partial y^{out}_1 } \frac{\partial y^{out}_1 }{\partial y^{in}_1} \frac{\partial y^{in}_1}{\partial W^2_{1,1}} & \frac{\partial Loss}{\partial y^{out}_2 } \frac{\partial y^{out}_2 }{\partial y^{in}_2} \frac{\partial y^{in}_2}{\partial W^2_{1,2}} & \frac{\partial Loss}{\partial y^{out}_3 } \frac{\partial y^{out}_3 }{\partial y^{in}_3} \frac{\partial y^{in}_3}{\partial W^2_{1,3}} \\ \frac{\partial Loss}{\partial y^{out}_1 } \frac{\partial y^{out}_1 }{\partial y^{in}_1} \frac{\partial y^{in}_1}{\partial W^2_{2,1}} & \frac{\partial Loss}{\partial y^{out}_2 } \frac{\partial y^{out}_2 }{\partial y^{in}_2} \frac{\partial y^{in}_2}{\partial W^2_{2,2}} & \frac{\partial Loss}{\partial y^{out}_3 } \frac{\partial y^{out}_3 }{\partial y^{in}_3} \frac{\partial y^{in}_3}{\partial W^2_{2,3}} \end{array}\right]  =\\ \left[\begin{array}{ccc}  (y^{out}_1 - target_1) \cdot 1 \cdot h^{out}_1 & (y^{out}_2 - target_2) \cdot 1 \cdot h^{out}_1 & (y^{out}_3 - target_3) \cdot 1 \cdot h^{out}_1 \\ (y^{out}_1 - target_1) \cdot 1 \cdot h^{out}_2 & (y^{out}_2 - target_2) \cdot 1 \cdot h^{out}_2 & (y^{out}_3 - target_3) \cdot 1 \cdot h^{out}_2 \\ \end{array}\right]  \end{gather*}

Now, the same for biases:

    \begin{gather*} \left[\begin{array}{ccc} \frac{\partial Loss}{\partial b^2_{1,1}} & \frac{\partial Loss}{\partial b^2_{1,2}} & \frac{\partial Loss}{\partial b^2_{1,3}} \\ \frac{\partial Loss}{\partial b^2_{2,1}} & \frac{\partial Loss}{\partial b^2_{2,2}} & \frac{\partial Loss}{\partial b^2_{2,3}} \end{array}\right] =  \left[\begin{array}{ccc}  \frac{\partial Loss}{\partial y^{out}_1 } \frac{\partial y^{out}_1 }{\partial y^{in}_1} \frac{\partial y^{in}_1}{\partial b^2_{1,1}} & \frac{\partial Loss}{\partial y^{out}_2 } \frac{\partial y^{out}_2 }{\partial y^{in}_2} \frac{\partial y^{in}_2}{\partial b^2_{1,2}} & \frac{\partial Loss}{\partial y^{out}_3 } \frac{\partial y^{out}_3 }{\partial y^{in}_3} \frac{\partial y^{in}_3}{\partial b^2_{1,3}} \\ \frac{\partial Loss}{\partial y^{out}_1 } \frac{\partial y^{out}_1 }{\partial y^{in}_1} \frac{\partial y^{in}_1}{\partial b^2_{2,1}} & \frac{\partial Loss}{\partial y^{out}_2 } \frac{\partial y^{out}_2 }{\partial y^{in}_2} \frac{\partial y^{in}_2}{\partial b^2_{2,2}} & \frac{\partial Loss}{\partial y^{out}_3 } \frac{\partial y^{out}_3 }{\partial y^{in}_3} \frac{\partial y^{in}_3}{\partial b^2_{2,3}} \end{array}\right]  =\\ \left[\begin{array}{ccc}  (y^{out}_1 - target_1) \cdot 1 \cdot 1 & (y^{out}_2 - target_2) \cdot 1 \cdot 1 & (y^{out}_3 - target_3) \cdot 1 \cdot 1 \\ (y^{out}_1 - target_1) \cdot 1 \cdot 1 & (y^{out}_2 - target_2) \cdot 1 \cdot 1 & (y^{out}_3 - target_3) \cdot 1 \cdot 1 \\ \end{array}\right]  \end{gather*}

That was easy. Now we use learning rate equal to 0.1 and we can update both weights and biases between hidden layer and output layer.

Similar things go for other updates. If you are lost, you can find great explanation here.

Let’s now see the code:

It is very similar to the solution from previous post. This time in phase 5 we calculate error, in phase 6 we update weights and biases. You can find results here.