This is the first part of the Machine Learning series. For your convenience you can find other parts using the links below (or by guessing the address):
Part 1 — Linear regression in MXNet
Part 2 — Linear regression in SQL
Part 3 — Linear regression in SQL revisited
Part 4 — Linear regression in T-SQL
In this series I assume you do know basics of machine learning. I will provide some source code for different use cases but no extensive explanation. Let’s go.
Today we will take a look at linear regression in MXNet. We will predict sepal length in well know iris dataset.
I assume you have the dataset uploaded to s3. Let’s go with loading the dataset:
from mxnet import nd, autograd import pandas as pd import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split local_file="/tmp/Iris.csv" df = pd.read_csv("/blabla/" + local_file, delimiter=',', header = 0) print df.shape
We can see some records with
print df.head(3) or check different iris categories with
We have one target variable and four features. Let’s create two adttional:
df['i_setosa'] = 0 df.loc[(df['iris']=='setosa'), 'i_setosa']= 1 df['i_versicolor'] = 0 df.loc[(df['iris']=='versicolor'), 'i_versicolor']= 1
Two features similar to one hot encoding of categorical feature.
Time to prepare training and test datasets with:
df_train, df_test = train_test_split( df, test_size=0.3, random_state=1)
Let’s get down to training. We start with defining the training variables and the target one:
independent_var = ['sepal_width','petal_length','petal_width','i_setosa','i_versicolor'] y_train = nd.array(df_train['sepal_length']) X_train = nd.array(df_train[independent_var]) y_test = nd.array(df_test['sepal_length']) X_test = nd.array(df_test[independent_var])
Let’s prepare class representing data instance:
class data: def __init__(self,X,y): self.X = nd.array(X) self.y = nd.array(y) cols = X.shape self.initialize_parameter(cols) def initialize_parameter(self,cols): self.w = nd.random.normal(shape = [cols, 1]) self.b = nd.random.normal(shape = 1) self.params = [self.w, self.b] for x in self.params: x.attach_grad()
We initialize parameters and attach gradient calculation. This is a very nice feature, we don’t need to take care of derivatives, everything is taken care for us.
Let’s now carry on with a single step for gradient:
class optimizer: def __init__(self): pass def GD(self,data_instance,lr): for x in data_instance.params: x[:] = x - x.grad * lr
We just subtract gradient multiplied by learning rate. Also, we use
x[:] instead of
x to avoid reinitializing the gradient. If we go with the latter, we will see the following error:
Check failed: !AGInfo::IsNone(*i) Cannot differentiate node because it is not in a computational graph. You need to set is_recording to true or use autograd.record() to save computational graphs for backward. If you want to differentiate the same graph twice, you need to pass retain_graph=True to backward.
Now, let’s train our model:
def main(): # Modeling parameters learning_rate = 1e-2 num_iters = 100 data_instance = data(X_train,y_train) opt = optimizer() gd = optimizer.GD loss_sequence =  for iteration in range(num_iters): with autograd.record(): loss = nd.mean((nd.dot(X_train, data_instance.w) + data_instance.b - y_train)**2) loss.backward() gd(opt, data_instance, learning_rate) print ("iteration %s, Mean loss: %s" % (iteration,loss)) loss_sequence.append(loss.asscalar()) plt.figure(num=None,figsize=(8, 6)) plt.plot(loss_sequence) plt.xlabel('iteration',fontsize=14) plt.ylabel('Mean loss',fontsize=14)
We should get the following:
Note that our
sum. However, due to overflow problems we would get the following:
Finally, let’s check the performance of trained model:
MSE = nd.mean(((nd.dot(X_test, data_instance.w) + data_instance.b) - y_test)**2) print ("Mean Squared Error on Test Set: %s" % (MSE))
We can see that linear regression is pretty concise and easy. However, this uses Python and Spark which me might want to avoid. In next parts we will take a look at different solutions.