Machine Learning Part 4 — Linear regression in T-SQL

This is the fourth part of the ML series. For your convenience you can find other parts in the table of contents in Part 1 – Linear regression in MXNet

This time we are going to implement linear regression as a function. This gives us a little more flexibility in terms of debugging the code and reading it later, also, we can implement much more complex algorithms. Too bad, we can’t use this in Redshift at this time as it doesn’t support such functions or stored procedures. So I will use T-SQL and test the code with MS SQL 2017. I assume you have table samples with Iris dataset.

We start with declaring a type for the function parameter:

Next, let’s prepare samples for training:

We generate table with numbers, next add more features, and then pivot them just like in the last part.

Finally, our function:

We extract featureIds so we can pass basically any dataset for training and it should work. We initialize coefficients with default values, do the same with gradients, and prepare some bookkeeping like iterations count or learning rate.

Next, in every iteration we start with calculating new coefficients based on old coefficients and old gradients. We clear distances table and calculate distance (which is the difference between predicted value and expected value) for each sample. Next, we calculate mean squared error.

Next, we need to calculate new gradients. For each feature we calculate the derivatives and we are done. We just need to store new coefficients and increase the counter.

Now we can execute the code:

And the result is: