Python – Random IT Utensils https://blog.adamfurmanek.pl IT, operating systems, maths, and more. Fri, 22 Mar 2024 20:53:42 +0000 en-US hourly 1 https://wordpress.org/?v=6.6.2 Types and Programming Languages Part 2 — Exception while handling exception https://blog.adamfurmanek.pl/2021/01/16/types-and-programming-languages-part-2/ https://blog.adamfurmanek.pl/2021/01/16/types-and-programming-languages-part-2/#comments Sat, 16 Jan 2021 09:00:12 +0000 https://blog.adamfurmanek.pl/?p=3725 Continue reading Types and Programming Languages Part 2 — Exception while handling exception]]>

This is the second part of the Types and Programming Languages series. For your convenience you can find other parts in the table of contents in Part 1 — Do not return in finally

Last time we saw what happens when we return in finally and that we shouldn’t do it. Today we explore a similar case of exception while handling exception. Let’s take this code in C#:

try{
	try{
		throw new Exception("Exception 1");
	}finally{
		throw new Exception("Exception 2");
	}
}catch(Exception e){
	Console.WriteLine(e);
}

What’s the output?

This question is a bit tricky. First, there are two exceptions in place and we know that typically various languages (including .NET platform) implement a two-pass exception system. First pass traverses the stack and looks for some handler capable of handling the exception, then second pass unwinds the stack and executes all finally blocks. But what if we throw exception in the second pass?

That depends and differs between languages. For instance, C# loses the exception, as specified by C# language specification:

If the finally block throws another exception, processing of the current exception is terminated.

Python 2 does the same, but Python 3 in PEP 3134 changes that:

The proposed semantics are as follows:
1. Each thread has an exception context initially set to None.
2. Whenever an exception is raised, if the exception instance does not already have a __context__ attribute, the interpreter sets it equal to the thread's exception context.
3. Immediately after an exception is raised, the thread's exception context is set to the exception.
4. Whenever the interpreter exits an except block by reaching the end or executing a return, yield, continue, or break statement, the thread's exception context is set to None.

It’s worth noting that some languages provide a field in the exception class which is supposed to store the previous one but if it’s not set automatically by the platform then the original problem still exists. What’s more, if that field is read only then it’s hard to fix the issue in place.

This is important when handling resources. Some languages provide a construct try with resources, for instance Java:

try (BufferedReader br = new BufferedReader(new FileReader(path))) {
	return br.readLine();
}

If it was implemented like this:

BufferedReader br = new BufferedReader(new FileReader(path));
try {
	return br.readLine();
} finally {
	if (br != null) br.close();
}

then exception thrown in finally block would erase the previous one. This is for instance how it’s implemented in C#. Java does it right, though.

]]>
https://blog.adamfurmanek.pl/2021/01/16/types-and-programming-languages-part-2/feed/ 1
Types and Programming Languages Part 1 — Do not return in finally https://blog.adamfurmanek.pl/2021/01/09/types-and-programming-languages-part-1/ https://blog.adamfurmanek.pl/2021/01/09/types-and-programming-languages-part-1/#comments Sat, 09 Jan 2021 09:00:37 +0000 https://blog.adamfurmanek.pl/?p=3712 Continue reading Types and Programming Languages Part 1 — Do not return in finally]]>

This is the first part of the Types and Programming Languages series. For your convenience you can find other parts using the links below:
Part 1 — Do not return in finally
Part 2 — Exception while handling exception
Part 3 — Finally during termination
Part 4 – Diamond problem
Part 5 – Sleeping and measuring time
Part 6 – Liskov substitution principle
Part 7 – Four types of experience in software engineering
Part 8 – Testing – is it worth it?
Part 9 – Real life testing
Part 10 – Dogmatic TDD
Part 11 – Principles of good debugging
Part 12 – A word on estimates and Story Points
Part 13 – Long-term planning
Part 14 – Pure functions vs impure ones
Part 15 – Prohibit vs Enable in the software engineering
Part 16 – Encapsulation and making all public
Part 17 – LSP in practice
Part 18 – Your defaults influence the way you think
Part 19 – Senior or Expert or what?
Part 20 – Measuring how hard your work is

Many languages provide exception handling construct, typically in a form of try and catch blocks. While details differ in terms what can be thrown, what can be handled etc, programmers generally assume these constructs work the same across languages. That’s not true, unfortunately, and details are often tricky when it comes to edge cases. We’ll cover that one day.

Some languages support additional block called finally which is supposed to be executed “no matter what” — whether the exception was thrown or not. That’s obviously not true, there are many situations when they may not be called, for instance unhandled exceptions, exiting the application, access violations (or segfaults) etc. I won’t be covering the details now, we’ll get to that some other time. What we’ll cover today is returning in finally.

Some languages let you return value from the finally block. Typical implementation makes the last returned value “win” over others. Let’s take this Java code:

class Ideone
{
	public static void main (String[] args) throws java.lang.Exception
	{
		System.out.println(foo());
	}
 
	public static int foo(){
		try{
			return 5;
		}finally{
			return 42;
		}
	}
}

The output is 42, because that’s the last returned value. You can observe the same behavior in Python, JS, Windows SEH, probably other platforms as well. Take your favorite language and check it out. One notable exception here is C# which doesn’t allow to return in finally, just to avoid this confusion.

Seems simple and makes sense. But what happens if you throw exception and then return? Let’s take this Java code:

class Ideone
{
	public static void main (String[] args) throws java.lang.Exception
	{
		System.out.println(foo());
	}
	
	public static int foo(){
		try{
			throw new RuntimeException("This disappears");
		}finally{
			return 42;
		}
	}
}

What’s the output? It’s still 42. Exception was lost.

You can see the same in JS:

function foo(){
  try{
    throw "This disappears";
  }finally{
    return 42;
  }
}
console.log(foo());

Python:

def foo():
  try:
    raise Exception("This disappears")
  finally:
    return 42

print(foo())

SEH:

#include "stdafx.h"
#include 

int Filter(){
	return EXCEPTION_EXECUTE_HANDLER;
}

int Foo(){
	__try{
		printf("%d", 1 / divideByZero);
		return 5;
	}
	__finally{
		return 42;
	}
}

int _tmain(int argc, _TCHAR* argv[])
{
	__try{
		printf("%d", Foo());
	}
	__except(Filter()){
	}
	return 0;
}

As a rule of thumb, never return in finally. It breaks exception handling mechanism.

]]>
https://blog.adamfurmanek.pl/2021/01/09/types-and-programming-languages-part-1/feed/ 7
RuntimeError: generator raised StopIteration in PySpark in Python 3.7.3 https://blog.adamfurmanek.pl/2020/08/15/runtimeerror-generator-raised-stopiteration-in-pyspark-in-python-3-7-3/ https://blog.adamfurmanek.pl/2020/08/15/runtimeerror-generator-raised-stopiteration-in-pyspark-in-python-3-7-3/#respond Sat, 15 Aug 2020 08:00:28 +0000 https://blog.adamfurmanek.pl/?p=3426 Continue reading RuntimeError: generator raised StopIteration in PySpark in Python 3.7.3]]> Recently I was debugging this simple PySpark code:

someDataframe = ...
someDict = someDataframe.collectAsMap()
someOtherDataframe.filter(lambda x: x in someDict).take(1)

First, we get some DataFrame. Next, we collect it to dictionary. It doesn’t matter how you create the dictionary, actually it could be a set or list as well. Finally, we do some filtering with lambda using in operator.

I was running this in Python 3.7.3 and was getting this exception:

[Stage 8:>                                                          (0 + 1) / 1]20/02/20 20:55:37 WARN TaskSetManager: Lost task 0.0 in stage 8.0 (TID 343, ip-10-0-0-2.ec2.internal, executor 20): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/usr/lib/spark/python/pyspark/rdd.py", line 1371, in takeUpToNumLeft
StopIteration

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/mnt/yarn/usercache/hadoop/appcache/application_1/container_1/pyspark.zip/pyspark/worker.py", line 230, in main
    process()
  File "/mnt/yarn/usercache/hadoop/appcache/application_1/container_1/pyspark.zip/pyspark/worker.py", line 225, in process
    serializer.dump_stream(func(split_index, iterator), outfile)
  File "/mnt/yarn/usercache/hadoop/appcache/application_1/container_1/pyspark.zip/pyspark/serializers.py", line 372, in dump_stream
    vs = list(itertools.islice(iterator, batch))
RuntimeError: generator raised StopIteration

It looks like this is related to in operator as replacing lambda with something like list(someDict.keys()).count(x) >= 0 stopped exceptions.
This is due to new generators behavior.
Similar issue was here, here and here.

]]>
https://blog.adamfurmanek.pl/2020/08/15/runtimeerror-generator-raised-stopiteration-in-pyspark-in-python-3-7-3/feed/ 0
Running Anaconda with DGL and mxnet on CUDA GPU in Spark running in EMR https://blog.adamfurmanek.pl/2020/03/28/running-anaconda-with-dgl-and-mxnet-on-cuda-gpu-in-spark-running-in-emr/ https://blog.adamfurmanek.pl/2020/03/28/running-anaconda-with-dgl-and-mxnet-on-cuda-gpu-in-spark-running-in-emr/#respond Sat, 28 Mar 2020 09:00:14 +0000 https://blog.adamfurmanek.pl/?p=3266 Continue reading Running Anaconda with DGL and mxnet on CUDA GPU in Spark running in EMR]]> Today I’m going to share my configuration for running custom Anaconda Python with DGL (Deep Graph Library) and mxnet library, with GPU support via CUDA, running in Spark hosted in EMR. Actually, I have Redshift configuration as well, with support for gensim, tensorflow, keras, theano, pygpu, and cloudpickle. You can also install more libraries if needed. All this for Google to index keywords. Let’s begin.

My configuration uses EMR 5.17.2 and CUDA 9.2. When I’m writing it, there is EMR 5.27 available but it comes with the same CUDA version so I presume it should work as well. I’m also using Python 3.7.

First, create a cluster. Do not select mxnet as a provided library in EMR, we will install it later. As a master node use p3.8xlarge instance type — this instance must have GPU and this is where we will run DGL and mxnet. For slaves you can use anything, I’m going with 19 r3.4xlarge nodes (they don’t have GPU).

We need to install some custom libraries. I am using bootstrap script for that but you can just SSH into the host manually and run this code:

sudo mkdir /mnt/usr-moved
sudo mv /usr/local /mnt/usr-moved/
sudo ln -s /mnt/usr-moved/local /usr/
sudo mv /usr/share /mnt/usr-moved/
sudo ln -s /mnt/usr-moved/share /usr/

sudo mkdir /mnt/home
sudo chown hadoop.hadoop /mnt/home

wget https://repo.anaconda.com/archive/Anaconda3-2019.03-Linux-x86_64.sh -O ~/anaconda.sh
bash ~/anaconda.sh -b -p /mnt/home/hadoop/anaconda
echo -e '\nexport PATH=/mnt/home/hadoop/anaconda/bin:$PATH' >> $HOME/.bashrc && source $HOME/.bashrc
echo -e '\nexport PYSPARK_PYTHON=/mnt/home/hadoop/anaconda/bin/python' >> $HOME/.bashrc && source $HOME/.bashrc

/mnt/home/hadoop/anaconda/bin/conda install -y gensim
/mnt/home/hadoop/anaconda/bin/conda install -y tensorflow
/mnt/home/hadoop/anaconda/bin/conda install -y keras
/mnt/home/hadoop/anaconda/bin/conda install -y theano
/mnt/home/hadoop/anaconda/bin/conda install -y pygpu
/mnt/home/hadoop/anaconda/bin/conda upgrade -y cloudpickle
yes | sudo /mnt/home/hadoop/anaconda/bin/pip install --pre mxnet-cu92mkl
/mnt/home/hadoop/anaconda/bin/conda install -y -c dglteam dgl-cuda9.2
/mnt/home/hadoop/anaconda/bin/conda install -y s3fs

First, I’m making a symlink to not fill the disk while installing packages. Then in line 10 I download Anaconda. Finally, lines 15-23 install some additional libraries. Notice that in line 21 I install mxnet compiled for CUDA 9.2, and in line 22 the same for DGL. Also, s3fs is required for nice reading from s3.

When this is done and cluster is created, I replace Python for Zeppelin interpreter to point to /mnt/home/hadoop/anaconda/bin/python and add Redshift configuration. I do this with the following command line (this you need to run manually after the cluster is created):

sudo /usr/lib/zeppelin/bin/install-interpreter.sh --name jdbc
cd /usr/lib/zeppelin/interpreter/jdbc/
sudo wget https://s3.amazonaws.com/redshift-downloads/drivers/jdbc/1.2.20.1043/RedshiftJDBC42-no-awssdk-1.2.20.1043.jar

sudo cat | sudo tee /etc/zeppelin/conf/interpreter.json <<'endmsg'
{
  "interpreterSettings": {
    "2ANGGHHMQ": {
      "id": "2ANGGHHMQ",
      "name": "spark",
      "group": "spark",
      "properties": {
        "zeppelin.spark.printREPLOutput": "true",
        "spark.yarn.jar": "",
        "master": "yarn-client",
        "zeppelin.spark.maxResult": "1000",
        "zeppelin.dep.localrepo": "/usr/lib/zeppelin/local-repo",
        "spark.app.name": "Zeppelin",
        "zeppelin.spark.importImplicit": "true",
        "zeppelin.spark.useHiveContext": "true",
        "args": "",
        "spark.home": "/usr/lib/spark",
        "zeppelin.spark.concurrentSQL": "false",
        "zeppelin.pyspark.python": "/mnt/home/hadoop/anaconda/bin/python"
      },
      "status": "READY",
      "interpreterGroup": [
        {
          "name": "spark",
          "class": "org.apache.zeppelin.spark.SparkInterpreter",
          "defaultInterpreter": false,
          "editor": {
            "language": "scala",
            "editOnDblClick": false
          }
        },
        {
          "name": "pyspark",
          "class": "org.apache.zeppelin.spark.PySparkInterpreter",
          "defaultInterpreter": false,
          "editor": {
            "language": "python",
            "editOnDblClick": false
          }
        },
        {
          "name": "sql",
          "class": "org.apache.zeppelin.spark.SparkSqlInterpreter",
          "defaultInterpreter": false,
          "editor": {
            "language": "sql",
            "editOnDblClick": false
          }
        }
      ],
      "dependencies": [
        {
          "groupArtifactVersion": "/usr/lib/zeppelin/interpreter/jdbc/RedshiftJDBC42-no-awssdk-1.2.20.1043.jar",
          "local": false
        }
      ],
      "option": {
        "remote": true,
        "port": -1,
        "perNote": "shared",
        "perUser": "shared",
        "isExistingProcess": false,
                "setPermission": false,
        "users": [],
        "isUserImpersonate": false
      }
    },
    "2AM1YV5CU": {
      "id": "2AM1YV5CU",
      "name": "angular",
      "group": "angular",
      "properties": {},
      "status": "READY",
      "interpreterGroup": [
        {
          "name": "angular",
          "class": "org.apache.zeppelin.angular.AngularInterpreter",
          "defaultInterpreter": false,
          "editor": {
            "editOnDblClick": true
          }
        }
      ],
      "dependencies": [],
      "option": {
        "remote": true,
        "port": -1,
        "perNote": "shared",
        "perUser": "shared",
        "isExistingProcess": false,
        "setPermission": false,
        "isUserImpersonate": false
      }
    },
    "2BRWU4WXC": {
      "id": "2BRWU4WXC",
      "name": "python",
      "group": "python",
      "properties": {
        "zeppelin.python": "/mnt/home/hadoop/anaconda/bin/python",
        "zeppelin.python.maxResult": "1000"
      },
      "status": "READY",
      "interpreterGroup": [
        {
          "name": "python",
          "class": "org.apache.zeppelin.python.PythonInterpreter",
          "defaultInterpreter": false,
          "editor": {
            "language": "python",
            "editOnDblClick": false
          }
        }
      ],
      "dependencies": [],
      "option": {
        "remote": true,
        "port": -1,
        "perNote": "shared",
        "perUser": "shared",
        "isExistingProcess": false,
        "setPermission": false,
        "users": [],
        "isUserImpersonate": false
              }
    },
    "2AJXGMUUJ": {
      "id": "2AJXGMUUJ",
      "name": "md",
      "group": "md",
      "properties": {},
      "status": "READY",
      "interpreterGroup": [
        {
          "name": "md",
          "class": "org.apache.zeppelin.markdown.Markdown",
          "defaultInterpreter": false,
          "editor": {
            "language": "markdown",
            "editOnDblClick": true
          }
        }
      ],
      "dependencies": [],
      "option": {
        "remote": true,
        "port": -1,
        "perNote": "shared",
        "perUser": "shared",
        "isExistingProcess": false,
        "setPermission": false,
        "isUserImpersonate": false
      }
    },
    "2AKK3QQXU": {
      "id": "2AKK3QQXU",
      "name": "sh",
      "group": "sh",
      "properties": {
        "shell.command.timeout.millisecs": "60000"
      },
      "status": "READY",
      "interpreterGroup": [
        {
          "name": "sh",
          "class": "org.apache.zeppelin.shell.ShellInterpreter",
          "defaultInterpreter": false,
          "editor": {
            "language": "sh",
            "editOnDblClick": false
          }
        }
      ],
      "dependencies": [],
      "option": {
        "remote": true,
        "port": -1,
        "perNote": "shared",
        "perUser": "shared",
        "isExistingProcess": false,
        "setPermission": false,
        "isUserImpersonate": false
      }
    }
  },
  "interpreterBindings": {
    "2EMW16J14": [
      "2ANGGHHMQ",
      "2AJXGMUUJ",
      "2AM1YV5CU",
      "2AKK3QQXU",
      "2BRWU4WXC"
    ],
    "2A94M5J1Z": [
      "2ANGGHHMQ",
      "2AJXGMUUJ",
      "2AM1YV5CU",
      "2AKK3QQXU",
      "2BRWU4WXC"
    ]
  },
  "interpreterRepositories": [
    {
      "id": "central",
      "type": "default",
      "url": "http://repo1.maven.org/maven2/",
      "releasePolicy": {
      "enabled": true,
        "updatePolicy": "daily",
        "checksumPolicy": "warn"
      },
      "snapshotPolicy": {
        "enabled": true,
        "updatePolicy": "daily",
        "checksumPolicy": "warn"
      },
      "mirroredRepositories": [],
      "repositoryManager": false
    },
    {
      "id": "local",
      "type": "default",
      "url": "file:///var/lib/zeppelin/.m2/repository",
      "releasePolicy": {
        "enabled": true,
        "updatePolicy": "daily",
        "checksumPolicy": "warn"
      },
      "snapshotPolicy": {
        "enabled": true,
        "updatePolicy": "daily",
        "checksumPolicy": "warn"
      },
      "mirroredRepositories": [],
      "repositoryManager": false
    }
  ]
}
endmsg

Now, I need to tune default spark submit options:

sudo cat | sudo tee /etc/zeppelin/conf.dist/zeppelin-env.sh <<'endmsg'
export ZEPPELIN_PORT=8890
export ZEPPELIN_CONF_DIR=/etc/zeppelin/conf
export ZEPPELIN_LOG_DIR=/var/log/zeppelin
export ZEPPELIN_PID_DIR=/var/run/zeppelin
export ZEPPELIN_PID=$ZEPPELIN_PID_DIR/zeppelin.pid
export ZEPPELIN_WAR_TEMPDIR=/var/run/zeppelin/webapps
export ZEPPELIN_NOTEBOOK_DIR=/var/lib/zeppelin/notebook
export MASTER=yarn-client
export SPARK_HOME=/usr/lib/spark
export HADOOP_CONF_DIR=/etc/hadoop/conf
export LD_LIBRARY_PATH=/usr/local/cuda/lib64/
export CLASSPATH=":/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar"


export SPARK_SUBMIT_OPTIONS="--jars=YOUR_JARS_HERE --conf spark.executor.extraClassPath=/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/* --conf spark.driver.extraClassPath=/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/* --conf 'spark.executorEnv.PYTHONPATH=/usr/lib/spark/python/lib/py4j-src.zip:/usr/lib/spark/python/:<CPS>{{PWD}}/pyspark.zip<CPS>{{PWD}}/py4j-src.zip' --conf spark.yarn.isPython=true --conf spark.driver.maxResultSize=80G --num-executors 56 --executor-cores 5 --executor-memory 38G --driver-memory 90G --conf 'spark.dynamicAllocation.enabled=false' --conf 'spark.hadoop.fs.s3.canned.acl=BucketOwnerFullControl' --conf 'spark.hadoop.fs.s3.acl.default=BucketOwnerFullControl' --conf 'spark.hadoop.fs.s3.acl=bucket-owner-full-control' --conf 'spark.hadoop.fs.s3n.acl=bucket-owner-full-control' --conf 'spark.hadoop.fs.s3.cse.enabled=false'"
export ZEPPELIN_NOTEBOOK_STORAGE=org.apache.zeppelin.notebook.repo.S3NotebookRepo
endmsg

This is not the full content! I omit some of my internal settings so generally don’t copy it blindly, just extend the zeppelin-env.sh file as needed. Important things are:
export LD_LIBRARY_PATH=/usr/local/cuda/lib64/ — this points to CUDA libraries
--conf spark.driver.maxResultSize=80G --num-executors 56 --executor-cores 5 --executor-memory 38G --driver-memory 90G --conf 'spark.dynamicAllocation.enabled=false' — this configures executors and memory. You need to tune it for your cluster size.

Now, restart Zeppelin. You should now be able to run:

%spark.pyspark
print(mx.__version__)
print(np.__version__)

1.6.0
1.14.6

Now you can create GPU context:

ctx = mx.gpu(0)

and it should work as a charm.

So now you have power of Spark — you can easily distribute job and use all slaves. And also, you have GPU at your hand, so whenever you use ndarray from mxnet, it can use the GPU power.

If you don’t want to use GPU, then just install these libraries instead:

yes | sudo /mnt/home/hadoop/anaconda/bin/pip install --pre mxnet
/mnt/home/hadoop/anaconda/bin/conda install -y -c dglteam dgl

and use mx.cpu() context. This works as well, obviously, much slower. For my use case GPU calculations were 80 times faster than when running on CPU.

]]>
https://blog.adamfurmanek.pl/2020/03/28/running-anaconda-with-dgl-and-mxnet-on-cuda-gpu-in-spark-running-in-emr/feed/ 0
Machine Learning Part 1 — Linear regression in MXNet https://blog.adamfurmanek.pl/2018/10/20/machine-learning-part-1/ https://blog.adamfurmanek.pl/2018/10/20/machine-learning-part-1/#comments Sat, 20 Oct 2018 08:00:13 +0000 https://blog.adamfurmanek.pl/?p=2629 Continue reading Machine Learning Part 1 — Linear regression in MXNet]]>

This is the first part of the Machine Learning series. For your convenience you can find other parts using the links below (or by guessing the address):
Part 1 — Linear regression in MXNet
Part 2 — Linear regression in SQL
Part 3 — Linear regression in SQL revisited
Part 4 — Linear regression in T-SQL
Part 5 — Linear regression
Part 6 — Matrix multiplication in SQL
Part 7 — Forward propagation in neural net in SQL
Part 8 — Backpropagation in neural net in SQL

In this series I assume you do know basics of machine learning. I will provide some source code for different use cases but no extensive explanation. Let’s go.

Today we will take a look at linear regression in MXNet. We will predict sepal length in well know iris dataset.

I assume you have the dataset uploaded to s3. Let’s go with loading the dataset:

from mxnet import nd, autograd
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split

local_file="/tmp/Iris.csv"
df = pd.read_csv("/blabla/" + local_file, delimiter=',', header = 0)

print df.shape

We can see some records with print df.head(3) or check different iris categories with df.iris.unique().

We have one target variable and four features. Let’s create two adttional:

df['i_setosa'] = 0
df.loc[(df['iris']=='setosa'), 'i_setosa']= 1
df['i_versicolor'] = 0
df.loc[(df['iris']=='versicolor'), 'i_versicolor']= 1

Two features similar to one hot encoding of categorical feature.

Time to prepare training and test datasets with: df_train, df_test = train_test_split( df, test_size=0.3, random_state=1)

Let’s get down to training. We start with defining the training variables and the target one:

independent_var = ['sepal_width','petal_length','petal_width','i_setosa','i_versicolor']
y_train = nd.array(df_train['sepal_length'])
X_train = nd.array(df_train[independent_var]) 
y_test = nd.array(df_test['sepal_length'])
X_test = nd.array(df_test[independent_var])

Let’s prepare class representing data instance:

class data:
    def __init__(self,X,y):
        self.X = nd.array(X) 
        self.y = nd.array(y)
        cols = X.shape[1]
        self.initialize_parameter(cols)

    def initialize_parameter(self,cols):
        self.w = nd.random.normal(shape = [cols, 1])
        self.b = nd.random.normal(shape = 1)
        self.params = [self.w, self.b]

        for x in self.params:
            x.attach_grad()

We initialize parameters and attach gradient calculation. This is a very nice feature, we don’t need to take care of derivatives, everything is taken care for us.

Let’s now carry on with a single step for gradient:

class optimizer:
    def __init__(self):
        pass
    
    def GD(self,data_instance,lr):
        for x in data_instance.params:
            x[:] = x - x.grad * lr

We just subtract gradient multiplied by learning rate. Also, we use x[:] instead of x to avoid reinitializing the gradient. If we go with the latter, we will see the following error:

Check failed: !AGInfo::IsNone(*i) Cannot differentiate node because it is not in a computational graph. You need to set is_recording to true or use autograd.record() to save computational graphs for backward. If you want to differentiate the same graph twice, you need to pass retain_graph=True to backward.

Now, let’s train our model:

def main():
    # Modeling parameters
    learning_rate = 1e-2
    num_iters = 100
    
    data_instance = data(X_train,y_train) 

    opt = optimizer()
    gd = optimizer.GD
    
    loss_sequence = []
    
    for iteration in range(num_iters):
        with autograd.record():
            loss = nd.mean((nd.dot(X_train, data_instance.w) + data_instance.b - y_train)**2)
            
        loss.backward()
        gd(opt, data_instance, learning_rate)
        
        print ("iteration %s, Mean loss: %s" % (iteration,loss))
        loss_sequence.append(loss.asscalar())
        
    plt.figure(num=None,figsize=(8, 6))
    plt.plot(loss_sequence)
    plt.xlabel('iteration',fontsize=14)
    plt.ylabel('Mean loss',fontsize=14)

We should get the following:

Training - mean

Note that our loss uses mean whereas we could calculate just a sum. However, due to overflow problems we would get the following:

Training - sum

Finally, let’s check the performance of trained model:

MSE = nd.mean(((nd.dot(X_test, data_instance.w) + data_instance.b) - y_test)**2)
print ("Mean Squared Error on Test Set: %s" % (MSE))

Done.

Summary

We can see that linear regression is pretty concise and easy. However, this uses Python and Spark which me might want to avoid. In next parts we will take a look at different solutions.

]]>
https://blog.adamfurmanek.pl/2018/10/20/machine-learning-part-1/feed/ 4
SQLxD Part 23 — Query parser tests https://blog.adamfurmanek.pl/2017/10/21/sqlxd-part-23/ https://blog.adamfurmanek.pl/2017/10/21/sqlxd-part-23/#respond Sat, 21 Oct 2017 08:00:26 +0000 https://blog.adamfurmanek.pl/?p=2220 Continue reading SQLxD Part 23 — Query parser tests]]>

This is the twenty third part of the SQLxD series. For your convenience you can find other parts in the table of contents in Part 1 – XML Transformation

Here go the tests for query parser:

using System;
using FluentAssertions;
using Model;
using NUnit.Framework;
using QueryLogic.Aggregates;
using QueryLogic.Expressions.CellExpressions;
using QueryLogic.Expressions.RowExpressions;
using QueryLogic.Filtering;
using QueryLogic.Grouping;
using QueryLogic.Joins.Implementation;
using QueryLogic.Ordering;
using QueryLogic.Predicates.Complex;
using QueryLogic.Predicates.Simple;
using QueryLogic.RelationProviding;
using QueryLogic.Selecting;
using QueryLogic.Selectors;
using QueryLogic.Transformers.CellTransformers;
using QueryLogic.Transformers.RowTransformers;
using QueryParser.Parsing;

namespace QueryParserTest
{
    [TestFixture]
    internal class ParserTests
    {
        [TestFixtureSetUp]
        public void Initialize()
        {
            // ReSharper disable once UnusedVariable
            var parser = new Parser();
        }

        private static void PerformTestWithAssert(string query, Select expectedModel, string[] expectedMessage)
        {
            // Arrange
            var parser = new Parser();

            // Act
            Tuple<Select, string[]> actual = parser.Parse(query);
            Select actualModel = actual.Item1;
            string[] actualMessage = actual.Item2;

            // Assert
            Console.WriteLine(string.Join("\n", actualMessage));
            actualMessage.ShouldBeEquivalentTo(expectedMessage);
            actualModel.Should().Be(expectedModel);
        }

        [Test]
        public void Parse_CrossJoin_WithUsingAliasInSelector()
        {
            // Arrange
            const string query = @"SELECT * FROM table AS t CROSS JOIN t.table2 AS t2";

            var firstSelector = new TopLevelSelector("table");
            var secondSelector = new ChainedSelector(new TopLevelSelector("table"), new NodeSelector("table2"));
            var firstFrom = new From(firstSelector, "t");
            var secondFrom = new From(secondSelector, "t2");
            var crossJoin = new CrossJoin(firstFrom, secondFrom);
            var expectedSelect = new Select(crossJoin);
            expectedSelect.AddRowTransformer(new IdentityRowTransformer());

            var expectedMessages = new string[0];

            // Act
            // Assert
            PerformTestWithAssert(query, expectedSelect, expectedMessages);
        }

        [Test]
        public void Parse_CrossJoin_WithoutUsingAliasInSelector()
        {
            // Arrange
            const string query = @"SELECT * FROM table AS t CROSS JOIN table2 AS t2";

            var firstSelector = new TopLevelSelector("table");
            var secondSelector = new TopLevelSelector("table2");
            var firstFrom = new From(firstSelector, "t");
            var secondFrom = new From(secondSelector, "t2");
            var crossJoin = new CrossJoin(firstFrom, secondFrom);
            var expectedSelect = new Select(crossJoin);
            expectedSelect.AddRowTransformer(new IdentityRowTransformer());

            var expectedMessages = new string[0];

            // Act
            // Assert
            PerformTestWithAssert(query, expectedSelect, expectedMessages);
        }

        [Test]
        public void Parse_FromWithOneNestedTable()
        {
            // Arrange
            const string query = @"SELECT * FROM level.table AS t";

            var selector = new ChainedSelector(new TopLevelSelector("level"), new NodeSelector("table"));
            var from = new From(selector, "t");
            var expectedSelect = new Select(from);
            expectedSelect.AddRowTransformer(new IdentityRowTransformer());

            var expectedMessages = new string[0];

            // Act
            // Assert
            PerformTestWithAssert(query, expectedSelect, expectedMessages);
        }

        [Test]
        public void Parse_FromWithOneTableOnTopLevel()
        {
            // Arrange
            const string query = @"SELECT * FROM table AS t";

            var selector = new TopLevelSelector("table");
            var from = new From(selector, "t");
            var expectedSelect = new Select(from);
            expectedSelect.AddRowTransformer(new IdentityRowTransformer());

            var expectedMessages = new string[0];

            // Act
            // Assert
            PerformTestWithAssert(query, expectedSelect, expectedMessages);
        }

        [Test]
        public void Parse_FullJoin_WithUsingAliasInSelector()
        {
            // Arrange
            const string query = @"SELECT * FROM table AS t FULL JOIN t.table2 AS t2 ON t.id = t2.id";

            var firstSelector = new TopLevelSelector("table");
            var secondSelector = new ChainedSelector(new TopLevelSelector("table"), new NodeSelector("table2"));
            var firstFrom = new From(firstSelector, "t");
            var secondFrom = new From(secondSelector, "t2");
            var outerJoin = new FullOuterJoin(firstFrom, secondFrom,
                new EqualPredicate(new GetCellRowExpression(new ColumnHeader("t", "id")),
                    new GetCellRowExpression(new ColumnHeader("t2", "id"))));
            var expectedSelect = new Select(outerJoin);
            expectedSelect.AddRowTransformer(new IdentityRowTransformer());

            var expectedMessages = new string[0];

            // Act
            // Assert
            PerformTestWithAssert(query, expectedSelect, expectedMessages);
        }

        [Test]
        public void Parse_FullJoin_WithoutUsingAliasInSelector()
        {
            // Arrange
            const string query = @"SELECT * FROM table AS t FULL JOIN table2 AS t2 ON t.id = t2.id";

            var firstSelector = new TopLevelSelector("table");
            var secondSelector = new TopLevelSelector("table2");
            var firstFrom = new From(firstSelector, "t");
            var secondFrom = new From(secondSelector, "t2");
            var outerJoin = new FullOuterJoin(firstFrom, secondFrom,
                new EqualPredicate(new GetCellRowExpression(new ColumnHeader("t", "id")),
                    new GetCellRowExpression(new ColumnHeader("t2", "id"))));
            var expectedSelect = new Select(outerJoin);
            expectedSelect.AddRowTransformer(new IdentityRowTransformer());

            var expectedMessages = new string[0];

            // Act
            // Assert
            PerformTestWithAssert(query, expectedSelect, expectedMessages);
        }

        [Test]
        public void Parse_FullOuterJoin_WithUsingAliasInSelector()
        {
            // Arrange
            const string query = @"SELECT * FROM table AS t FULL OUTER JOIN t.table2 AS t2 ON t.id = t2.id";

            var firstSelector = new TopLevelSelector("table");
            var secondSelector = new ChainedSelector(new TopLevelSelector("table"), new NodeSelector("table2"));
            var firstFrom = new From(firstSelector, "t");
            var secondFrom = new From(secondSelector, "t2");
            var outerJoin = new FullOuterJoin(firstFrom, secondFrom,
                new EqualPredicate(new GetCellRowExpression(new ColumnHeader("t", "id")),
                    new GetCellRowExpression(new ColumnHeader("t2", "id"))));
            var expectedSelect = new Select(outerJoin);
            expectedSelect.AddRowTransformer(new IdentityRowTransformer());

            var expectedMessages = new string[0];

            // Act
            // Assert
            PerformTestWithAssert(query, expectedSelect, expectedMessages);
        }

        [Test]
        public void Parse_FullOuterJoin_WithoutUsingAliasInSelector()
        {
            // Arrange
            const string query = @"SELECT * FROM table AS t FULL OUTER JOIN table2 AS t2 ON t.id = t2.id";

            var firstSelector = new TopLevelSelector("table");
            var secondSelector = new TopLevelSelector("table2");
            var firstFrom = new From(firstSelector, "t");
            var secondFrom = new From(secondSelector, "t2");
            var outerJoin = new FullOuterJoin(firstFrom, secondFrom,
                new EqualPredicate(new GetCellRowExpression(new ColumnHeader("t", "id")),
                    new GetCellRowExpression(new ColumnHeader("t2", "id"))));
            var expectedSelect = new Select(outerJoin);
            expectedSelect.AddRowTransformer(new IdentityRowTransformer());

            var expectedMessages = new string[0];

            // Act
            // Assert
            PerformTestWithAssert(query, expectedSelect, expectedMessages);
        }

        [Test]
        public void Parse_GroupBy()
        {
            // Arrange
            const string query = @"SELECT * FROM table AS t GROUP BY t.id, t.name";

            var selector = new TopLevelSelector("table");
            var from = new From(selector, "t");
            var groupBy = new GroupBy(from, new[]
            {
                new GroupingSet(new[]
                {
                    new ColumnHeader("t", "id"),
                    new ColumnHeader("t", "name")
                })
            });
            var expectedSelect = new Select(groupBy);
            expectedSelect.AddRowTransformer(new IdentityRowTransformer());

            var expectedMessages = new string[0];

            // Act
            // Assert
            PerformTestWithAssert(query, expectedSelect, expectedMessages);
        }

        [Test]
        public void Parse_GroupByWithOrderBy()
        {
            // Arrange
            const string query = @"SELECT * FROM table AS t GROUP BY t.id ORDER BY t.id, t.name";

            var selector = new TopLevelSelector("table");
            var from = new From(selector, "t");
            var orderBy = new OrderBy(new[]
            {
                new ColumnOrdering(new ColumnHeader("t", "id")),
                new ColumnOrdering(new ColumnHeader("t", "name"))
            });
            var groupBy = new GroupBy(from, new[]
            {
                new GroupingSet(new[]
                {
                    new ColumnHeader("t", "id")
                })
            });
            var expectedSelect = new Select(groupBy, orderBy);
            expectedSelect.AddRowTransformer(new IdentityRowTransformer());

            var expectedMessages = new string[0];

            // Act
            // Assert
            PerformTestWithAssert(query, expectedSelect, expectedMessages);
        }

        [Test]
        public void Parse_GroupByWithOrderByWithWhere()
        {
            // Arrange
            const string query = @"SELECT * FROM table AS t WHERE t.id = '5' GROUP BY t.id ORDER BY t.id, t.name";

            var selector = new TopLevelSelector("table");
            var from = new From(selector, "t");
            var predicate = new EqualPredicate(new GetCellRowExpression(new ColumnHeader("t", "id")),
                new ConstantRowExpression("5"));
            var where = new Where(from, predicate);
            var orderBy = new OrderBy(new[]
            {
                new ColumnOrdering(new ColumnHeader("t", "id")),
                new ColumnOrdering(new ColumnHeader("t", "name"))
            });
            var groupBy = new GroupBy(where, new[]
            {
                new GroupingSet(new[]
                {
                    new ColumnHeader("t", "id")
                })
            });
            var expectedSelect = new Select(groupBy, orderBy);
            expectedSelect.AddRowTransformer(new IdentityRowTransformer());

            var expectedMessages = new string[0];

            // Act
            // Assert
            PerformTestWithAssert(query, expectedSelect, expectedMessages);
        }

        [Test]
        public void Parse_GroupByWithWhere()
        {
            // Arrange
            const string query = @"SELECT * FROM table AS t WHERE t.id <> '5' GROUP BY t.id, t.name";

            var selector = new TopLevelSelector("table");
            var from = new From(selector, "t");
            var predicate = new NotEqualPredicate(new GetCellRowExpression(new ColumnHeader("t", "id")),
                new ConstantRowExpression("5"));
            var where = new Where(from, predicate);
            var groupBy = new GroupBy(where, new[]
            {
                new GroupingSet(new[]
                {
                    new ColumnHeader("t", "id"),
                    new ColumnHeader("t", "name")
                })
            });
            var expectedSelect = new Select(groupBy);
            expectedSelect.AddRowTransformer(new IdentityRowTransformer());

            var expectedMessages = new string[0];

            // Act
            // Assert
            PerformTestWithAssert(query, expectedSelect, expectedMessages);
        }

        [Test]
        public void Parse_IncorrectQuery_ShouldReturnMessage()
        {
            // Arrange
            const string query = @"SELECT * FROM table";

            var expectedMessages = new[]
            {
                "Syntax error - unexpected EOF "
            };

            // Act
            // Assert
            PerformTestWithAssert(query, null, expectedMessages);
        }

        [Test]
        public void Parse_InnerJoin_WithUsingAliasInSelector()
        {
            // Arrange
            const string query = @"SELECT * FROM table AS t INNER JOIN t.table2 AS t2 ON t.id = t2.id";

            var firstSelector = new TopLevelSelector("table");
            var secondSelector = new ChainedSelector(new TopLevelSelector("table"), new NodeSelector("table2"));
            var firstFrom = new From(firstSelector, "t");
            var secondFrom = new From(secondSelector, "t2");
            var innerJoin = new InnerJoin(firstFrom, secondFrom,
                new EqualPredicate(new GetCellRowExpression(new ColumnHeader("t", "id")),
                    new GetCellRowExpression(new ColumnHeader("t2", "id"))));
            var expectedSelect = new Select(innerJoin);
            expectedSelect.AddRowTransformer(new IdentityRowTransformer());

            var expectedMessages = new string[0];

            // Act
            // Assert
            PerformTestWithAssert(query, expectedSelect, expectedMessages);
        }

        [Test]
        public void Parse_InnerJoin_WithoutUsingAliasInSelector()
        {
            // Arrange
            const string query = @"SELECT * FROM table AS t INNER JOIN table2 AS t2 ON t.id = t2.id";

            var firstSelector = new TopLevelSelector("table");
            var secondSelector = new TopLevelSelector("table2");
            var firstFrom = new From(firstSelector, "t");
            var secondFrom = new From(secondSelector, "t2");
            var innerJoin = new InnerJoin(firstFrom, secondFrom,
                new EqualPredicate(new GetCellRowExpression(new ColumnHeader("t", "id")),
                    new GetCellRowExpression(new ColumnHeader("t2", "id"))));
            var expectedSelect = new Select(innerJoin);
            expectedSelect.AddRowTransformer(new IdentityRowTransformer());

            var expectedMessages = new string[0];

            // Act
            // Assert
            PerformTestWithAssert(query, expectedSelect, expectedMessages);
        }


        [Test]
        public void Parse_LeftJoin_WithUsingAliasInSelector()
        {
            // Arrange
            const string query = @"SELECT * FROM table AS t LEFT JOIN t.table2 AS t2 ON t.id = t2.id";

            var firstSelector = new TopLevelSelector("table");
            var secondSelector = new ChainedSelector(new TopLevelSelector("table"), new NodeSelector("table2"));
            var firstFrom = new From(firstSelector, "t");
            var secondFrom = new From(secondSelector, "t2");
            var outerJoin = new LeftOuterJoin(firstFrom, secondFrom,
                new EqualPredicate(new GetCellRowExpression(new ColumnHeader("t", "id")),
                    new GetCellRowExpression(new ColumnHeader("t2", "id"))));
            var expectedSelect = new Select(outerJoin);
            expectedSelect.AddRowTransformer(new IdentityRowTransformer());

            var expectedMessages = new string[0];

            // Act
            // Assert
            PerformTestWithAssert(query, expectedSelect, expectedMessages);
        }

        [Test]
        public void Parse_LeftJoin_WithoutUsingAliasInSelector()
        {
            // Arrange
            const string query = @"SELECT * FROM table AS t LEFT JOIN table2 AS t2 ON t.id = t2.id";

            var firstSelector = new TopLevelSelector("table");
            var secondSelector = new TopLevelSelector("table2");
            var firstFrom = new From(firstSelector, "t");
            var secondFrom = new From(secondSelector, "t2");
            var outerJoin = new LeftOuterJoin(firstFrom, secondFrom,
                new EqualPredicate(new GetCellRowExpression(new ColumnHeader("t", "id")),
                    new GetCellRowExpression(new ColumnHeader("t2", "id"))));
            var expectedSelect = new Select(outerJoin);
            expectedSelect.AddRowTransformer(new IdentityRowTransformer());

            var expectedMessages = new string[0];

            // Act
            // Assert
            PerformTestWithAssert(query, expectedSelect, expectedMessages);
        }

        [Test]
        public void Parse_LeftOuterJoin_WithUsingAliasInSelector()
        {
            // Arrange
            const string query = @"SELECT * FROM table AS t LEFT OUTER JOIN t.table2 AS t2 ON t.id = t2.id";

            var firstSelector = new TopLevelSelector("table");
            var secondSelector = new ChainedSelector(new TopLevelSelector("table"), new NodeSelector("table2"));
            var firstFrom = new From(firstSelector, "t");
            var secondFrom = new From(secondSelector, "t2");
            var outerJoin = new LeftOuterJoin(firstFrom, secondFrom,
                new EqualPredicate(new GetCellRowExpression(new ColumnHeader("t", "id")),
                    new GetCellRowExpression(new ColumnHeader("t2", "id"))));
            var expectedSelect = new Select(outerJoin);
            expectedSelect.AddRowTransformer(new IdentityRowTransformer());

            var expectedMessages = new string[0];

            // Act
            // Assert
            PerformTestWithAssert(query, expectedSelect, expectedMessages);
        }

        [Test]
        public void Parse_LeftOuterJoin_WithoutUsingAliasInSelector()
        {
            // Arrange
            const string query = @"SELECT * FROM table AS t LEFT OUTER JOIN table2 AS t2 ON t.id = t2.id";

            var firstSelector = new TopLevelSelector("table");
            var secondSelector = new TopLevelSelector("table2");
            var firstFrom = new From(firstSelector, "t");
            var secondFrom = new From(secondSelector, "t2");
            var outerJoin = new LeftOuterJoin(firstFrom, secondFrom,
                new EqualPredicate(new GetCellRowExpression(new ColumnHeader("t", "id")),
                    new GetCellRowExpression(new ColumnHeader("t2", "id"))));
            var expectedSelect = new Select(outerJoin);
            expectedSelect.AddRowTransformer(new IdentityRowTransformer());

            var expectedMessages = new string[0];

            // Act
            // Assert
            PerformTestWithAssert(query, expectedSelect, expectedMessages);
        }

        [Test]
        public void Parse_NaturalJoinWithUsingAliasInSelector()
        {
            // Arrange
            const string query = @"SELECT * FROM table AS t NATURAL JOIN t.table2 AS t2";

            var firstSelector = new TopLevelSelector("table");
            var secondSelector = new ChainedSelector(new TopLevelSelector("table"), new NodeSelector("table2"));
            var firstFrom = new From(firstSelector, "t");
            var secondFrom = new From(secondSelector, "t2");
            var naturalJoin = new NaturalJoin(firstFrom, secondFrom);
            var expectedSelect = new Select(naturalJoin);
            expectedSelect.AddRowTransformer(new IdentityRowTransformer());

            var expectedMessages = new string[0];

            // Act
            // Assert
            PerformTestWithAssert(query, expectedSelect, expectedMessages);
        }

        [Test]
        public void Parse_NaturalJoinWithoutUsingAliasInSelector()
        {
            // Arrange
            const string query = @"SELECT * FROM table AS t NATURAL JOIN table2 AS t2";

            var firstSelector = new TopLevelSelector("table");
            var secondSelector = new TopLevelSelector("table2");
            var firstFrom = new From(firstSelector, "t");
            var secondFrom = new From(secondSelector, "t2");
            var naturalJoin = new NaturalJoin(firstFrom, secondFrom);
            var expectedSelect = new Select(naturalJoin);
            expectedSelect.AddRowTransformer(new IdentityRowTransformer());

            var expectedMessages = new string[0];

            // Act
            // Assert
            PerformTestWithAssert(query, expectedSelect, expectedMessages);
        }

        [Test]
        public void Parse_NaturalJoin_MultipleOccurences()
        {
            // Arrange
            const string query = @"SELECT * FROM table AS t NATURAL JOIN table2 AS t2 NATURAL JOIN table3 AS t3";

            var firstSelector = new TopLevelSelector("table");
            var secondSelector = new TopLevelSelector("table2");
            var thirdSelector = new TopLevelSelector("table3");
            var firstFrom = new From(firstSelector, "t");
            var secondFrom = new From(secondSelector, "t2");
            var thirdFrom = new From(thirdSelector, "t3");
            var firstJoin = new NaturalJoin(firstFrom, secondFrom);
            var secondJoin = new NaturalJoin(firstJoin, thirdFrom);
            var expectedSelect = new Select(secondJoin);
            expectedSelect.AddRowTransformer(new IdentityRowTransformer());

            var expectedMessages = new string[0];

            // Act
            // Assert
            PerformTestWithAssert(query, expectedSelect, expectedMessages);
        }

        [Test]
        public void Parse_OrderBy()
        {
            // Arrange
            const string query = @"SELECT * FROM table AS t ORDER BY t.id, t.name";

            var selector = new TopLevelSelector("table");
            var from = new From(selector, "t");
            var orderBy = new OrderBy(new[]
            {
                new ColumnOrdering(new ColumnHeader("t", "id")),
                new ColumnOrdering(new ColumnHeader("t", "name"))
            });
            var expectedSelect = new Select(from, orderBy);
            expectedSelect.AddRowTransformer(new IdentityRowTransformer());

            var expectedMessages = new string[0];

            // Act
            // Assert
            PerformTestWithAssert(query, expectedSelect, expectedMessages);
        }

        [Test]
        public void Parse_OrderByWithDescendingColumn()
        {
            // Arrange
            const string query = @"SELECT * FROM table AS t ORDER BY t.id DESC";

            var selector = new TopLevelSelector("table");
            var from = new From(selector, "t");
            var orderBy = new OrderBy(new[]
            {
                new ColumnOrdering(new ColumnHeader("t", "id"), OrderDirection.Descending)
            });
            var expectedSelect = new Select(from, orderBy);
            expectedSelect.AddRowTransformer(new IdentityRowTransformer());

            var expectedMessages = new string[0];

            // Act
            // Assert
            PerformTestWithAssert(query, expectedSelect, expectedMessages);
        }

        [Test]
        public void Parse_OrderByWithFetchCount()
        {
            // Arrange
            const string query = @"SELECT * FROM table AS t ORDER BY t.id, t.name FETCH 3";

            var selector = new TopLevelSelector("table");
            var from = new From(selector, "t");
            var orderBy = new OrderBy(new[]
            {
                new ColumnOrdering(new ColumnHeader("t", "id")),
                new ColumnOrdering(new ColumnHeader("t", "name"))
            }, null, 3);
            var expectedSelect = new Select(from, orderBy);
            expectedSelect.AddRowTransformer(new IdentityRowTransformer());

            var expectedMessages = new string[0];

            // Act
            // Assert
            PerformTestWithAssert(query, expectedSelect, expectedMessages);
        }

        [Test]
        public void Parse_OrderByWithSkipCount()
        {
            // Arrange
            const string query = @"SELECT * FROM table AS t ORDER BY t.id, t.name SKIP 3";

            var selector = new TopLevelSelector("table");
            var from = new From(selector, "t");
            var orderBy = new OrderBy(new[]
            {
                new ColumnOrdering(new ColumnHeader("t", "id")),
                new ColumnOrdering(new ColumnHeader("t", "name"))
            }, 3);
            var expectedSelect = new Select(from, orderBy);
            expectedSelect.AddRowTransformer(new IdentityRowTransformer());

            var expectedMessages = new string[0];

            // Act
            // Assert
            PerformTestWithAssert(query, expectedSelect, expectedMessages);
        }

        [Test]
        public void Parse_OrderByWithSkipCountAndFetchCount()
        {
            // Arrange
            const string query = @"SELECT * FROM table AS t ORDER BY t.id, t.name SKIP 2 FETCH 3";

            var selector = new TopLevelSelector("table");
            var from = new From(selector, "t");
            var orderBy = new OrderBy(new[]
            {
                new ColumnOrdering(new ColumnHeader("t", "id")),
                new ColumnOrdering(new ColumnHeader("t", "name"))
            }, 2, 3);
            var expectedSelect = new Select(from, orderBy);
            expectedSelect.AddRowTransformer(new IdentityRowTransformer());

            var expectedMessages = new string[0];

            // Act
            // Assert
            PerformTestWithAssert(query, expectedSelect, expectedMessages);
        }

        [Test]
        public void Parse_OrderByWithWhere()
        {
            // Arrange
            const string query = @"SELECT * FROM table AS t WHERE t.id = '5' ORDER BY t.id, t.name";

            var selector = new TopLevelSelector("table");
            var from = new From(selector, "t");
            var predicate = new EqualPredicate(new GetCellRowExpression(new ColumnHeader("t", "id")),
                new ConstantRowExpression("5"));
            var where = new Where(from, predicate);
            var orderBy = new OrderBy(new[]
            {
                new ColumnOrdering(new ColumnHeader("t", "id")),
                new ColumnOrdering(new ColumnHeader("t", "name"))
            });
            var expectedSelect = new Select(where, orderBy);
            expectedSelect.AddRowTransformer(new IdentityRowTransformer());

            var expectedMessages = new string[0];

            // Act
            // Assert
            PerformTestWithAssert(query, expectedSelect, expectedMessages);
        }


        [Test]
        public void Parse_RightJoin_WithUsingAliasInSelector()
        {
            // Arrange
            const string query = @"SELECT * FROM table AS t RIGHT JOIN t.table2 AS t2 ON t.id = t2.id";

            var firstSelector = new TopLevelSelector("table");
            var secondSelector = new ChainedSelector(new TopLevelSelector("table"), new NodeSelector("table2"));
            var firstFrom = new From(firstSelector, "t");
            var secondFrom = new From(secondSelector, "t2");
            var outerJoin = new RightOuterJoin(firstFrom, secondFrom,
                new EqualPredicate(new GetCellRowExpression(new ColumnHeader("t", "id")),
                    new GetCellRowExpression(new ColumnHeader("t2", "id"))));
            var expectedSelect = new Select(outerJoin);
            expectedSelect.AddRowTransformer(new IdentityRowTransformer());

            var expectedMessages = new string[0];

            // Act
            // Assert
            PerformTestWithAssert(query, expectedSelect, expectedMessages);
        }

        [Test]
        public void Parse_RightJoin_WithoutUsingAliasInSelector()
        {
            // Arrange
            const string query = @"SELECT * FROM table AS t RIGHT JOIN table2 AS t2 ON t.id = t2.id";

            var firstSelector = new TopLevelSelector("table");
            var secondSelector = new TopLevelSelector("table2");
            var firstFrom = new From(firstSelector, "t");
            var secondFrom = new From(secondSelector, "t2");
            var outerJoin = new RightOuterJoin(firstFrom, secondFrom,
                new EqualPredicate(new GetCellRowExpression(new ColumnHeader("t", "id")),
                    new GetCellRowExpression(new ColumnHeader("t2", "id"))));
            var expectedSelect = new Select(outerJoin);
            expectedSelect.AddRowTransformer(new IdentityRowTransformer());

            var expectedMessages = new string[0];

            // Act
            // Assert
            PerformTestWithAssert(query, expectedSelect, expectedMessages);
        }

        [Test]
        public void Parse_RightOuterJoin_WithUsingAliasInSelector()
        {
            // Arrange
            const string query = @"SELECT * FROM table AS t RIGHT OUTER JOIN t.table2 AS t2 ON t.id = t2.id";

            var firstSelector = new TopLevelSelector("table");
            var secondSelector = new ChainedSelector(new TopLevelSelector("table"), new NodeSelector("table2"));
            var firstFrom = new From(firstSelector, "t");
            var secondFrom = new From(secondSelector, "t2");
            var outerJoin = new RightOuterJoin(firstFrom, secondFrom,
                new EqualPredicate(new GetCellRowExpression(new ColumnHeader("t", "id")),
                    new GetCellRowExpression(new ColumnHeader("t2", "id"))));
            var expectedSelect = new Select(outerJoin);
            expectedSelect.AddRowTransformer(new IdentityRowTransformer());

            var expectedMessages = new string[0];

            // Act
            // Assert
            PerformTestWithAssert(query, expectedSelect, expectedMessages);
        }

        [Test]
        public void Parse_RightOuterJoin_WithoutUsingAliasInSelector()
        {
            // Arrange
            const string query = @"SELECT * FROM table AS t RIGHT OUTER JOIN table2 AS t2 ON t.id = t2.id";

            var firstSelector = new TopLevelSelector("table");
            var secondSelector = new TopLevelSelector("table2");
            var firstFrom = new From(firstSelector, "t");
            var secondFrom = new From(secondSelector, "t2");
            var outerJoin = new RightOuterJoin(firstFrom, secondFrom,
                new EqualPredicate(new GetCellRowExpression(new ColumnHeader("t", "id")),
                    new GetCellRowExpression(new ColumnHeader("t2", "id"))));
            var expectedSelect = new Select(outerJoin);
            expectedSelect.AddRowTransformer(new IdentityRowTransformer());

            var expectedMessages = new string[0];

            // Act
            // Assert
            PerformTestWithAssert(query, expectedSelect, expectedMessages);
        }

        [Test]
        public void Parse_Select_ColumnAggregatesWithAliases()
        {
            // Arrange
            const string query = @"SELECT MIN(t.id) AS minimum, MAX(t.id) AS maximum FROM table AS t";

            var selector = new TopLevelSelector("table");
            var from = new From(selector, "t");
            var expectedSelect = new Select(from);
            expectedSelect.AddAggregate(
                new MinimumAggregate(new GetOriginalCellCellExpression(new ColumnHeader("t", "id")),
                    new GetOriginalCellCellExpression(new ColumnHeader("", "minimum")), "minimum"));
            expectedSelect.AddAggregate(
                new MaximumAggregate(new GetOriginalCellCellExpression(new ColumnHeader("t", "id")),
                    new GetOriginalCellCellExpression(new ColumnHeader("", "maximum")), "maximum"));
            expectedSelect.AddRowTransformer(new CellsRowTransformer(new ICellTransformer[0]));

            var expectedMessages = new string[0];

            // Act
            // Assert
            PerformTestWithAssert(query, expectedSelect, expectedMessages);
        }

        [Test]
        public void Parse_Select_ColumnAggregatesWithInnerFunctions()
        {
            // Arrange
            const string query = @"SELECT MIN(LEN(TRIM(t.id))) FROM table AS t";

            var selector = new TopLevelSelector("table");
            var from = new From(selector, "t");
            var expectedSelect = new Select(from);
            var columnHeader = new ColumnHeader("t", "id");
            expectedSelect.AddAggregate(
                new MinimumAggregate(
                    new ChainedCellExpression(new GetLengthCellExpression(columnHeader),
                        new ChainedCellExpression(new GetTrimmedCellExpression(columnHeader),
                            new GetOriginalCellCellExpression(columnHeader)))
                    , new GetOriginalCellCellExpression(new ColumnHeader("", ""))));
            expectedSelect.AddRowTransformer(new CellsRowTransformer(new ICellTransformer[0]));

            var expectedMessages = new string[0];

            // Act
            // Assert
            PerformTestWithAssert(query, expectedSelect, expectedMessages);
        }

        [Test]
        public void Parse_Select_ColumnAggregatesWithInnerFunctionsAndOuterFunctions()
        {
            // Arrange
            const string query = @"SELECT LEN(MIN(LEN(TRIM(t.id)))) FROM table AS t";

            var selector = new TopLevelSelector("table");
            var from = new From(selector, "t");
            var expectedSelect = new Select(from);
            var columnHeader = new ColumnHeader("t", "id");
            expectedSelect.AddAggregate(
                new MinimumAggregate(
                    new ChainedCellExpression(new GetLengthCellExpression(columnHeader),
                        new ChainedCellExpression(new GetTrimmedCellExpression(columnHeader),
                            new GetOriginalCellCellExpression(columnHeader)))
                    ,
                    new ChainedCellExpression(new GetLengthCellExpression(new ColumnHeader("", "")),
                        new GetOriginalCellCellExpression(new ColumnHeader("", "")))));
            expectedSelect.AddRowTransformer(new CellsRowTransformer(new ICellTransformer[0]));

            var expectedMessages = new string[0];

            // Act
            // Assert
            PerformTestWithAssert(query, expectedSelect, expectedMessages);
        }


        [Test]
        public void Parse_Select_ColumnAggregatesWithoutAliases()
        {
            // Arrange
            const string query = @"SELECT MIN(t.id), MAX(t.id), SUM(t.id), AVG(t.id), COUNT(t.id) FROM table AS t";

            var selector = new TopLevelSelector("table");
            var from = new From(selector, "t");
            var expectedSelect = new Select(from);
            expectedSelect.AddAggregate(
                new MinimumAggregate(new GetOriginalCellCellExpression(new ColumnHeader("t", "id")),
                    new GetOriginalCellCellExpression(new ColumnHeader("", ""))));
            expectedSelect.AddAggregate(
                new MaximumAggregate(new GetOriginalCellCellExpression(new ColumnHeader("t", "id")),
                    new GetOriginalCellCellExpression(new ColumnHeader("", ""))));
            expectedSelect.AddAggregate(new SumAggregate(
                new GetOriginalCellCellExpression(new ColumnHeader("t", "id")),
                new GetOriginalCellCellExpression(new ColumnHeader("", ""))));
            expectedSelect.AddAggregate(
                new AverageAggregate(new GetOriginalCellCellExpression(new ColumnHeader("t", "id")),
                    new GetOriginalCellCellExpression(new ColumnHeader("", ""))));
            expectedSelect.AddAggregate(
                new CountColumnAggregate(new GetOriginalCellCellExpression(new ColumnHeader("t", "id")),
                    new GetOriginalCellCellExpression(new ColumnHeader("", ""))));
            expectedSelect.AddRowTransformer(new CellsRowTransformer(new ICellTransformer[0]));

            var expectedMessages = new string[0];

            // Act
            // Assert
            PerformTestWithAssert(query, expectedSelect, expectedMessages);
        }

        [Test]
        public void Parse_Select_CountAllAggregate()
        {
            // Arrange
            const string query = @"SELECT COUNT(*) FROM table AS t";

            var selector = new TopLevelSelector("table");
            var from = new From(selector, "t");
            var expectedSelect = new Select(from);
            expectedSelect.AddAggregate(new CountAggregate(new GetOriginalCellCellExpression(new ColumnHeader("", ""))));
            expectedSelect.AddRowTransformer(new CellsRowTransformer(new ICellTransformer[0]));

            var expectedMessages = new string[0];

            // Act
            // Assert
            PerformTestWithAssert(query, expectedSelect, expectedMessages);
        }

        [Test]
        public void Parse_Select_ParseFunctionWithManyArguments()
        {
            // Arrange
            const string query = @"SELECT SUBSTR(t.id, '2', '5'), CATENATE(t.id, 'asd') FROM table AS t";

            var selector = new TopLevelSelector("table");
            var from = new From(selector, "t");
            var expectedSelect = new Select(from);
            var columnHeader = new ColumnHeader("t", "id");
            expectedSelect.AddRowTransformer(new CellsRowTransformer(new[]
            {
                new CellTransformer(
                    new ChainedCellExpression(new GetSubstringCellExpression(columnHeader, 2, 5),
                        new GetOriginalCellCellExpression(columnHeader)), ""),
                new CellTransformer(
                    new ChainedCellExpression(new CatenateStringCellExpression(columnHeader, "asd"),
                        new GetOriginalCellCellExpression(columnHeader)), "")
            }));

            var expectedMessages = new string[0];

            // Act
            // Assert
            PerformTestWithAssert(query, expectedSelect, expectedMessages);
        }

        [Test]
        public void Parse_Select_ParseFunctions()
        {
            // Arrange
            const string query =
                @"SELECT LEN(TRIM(t.id)), YEAR(t.id), MONTH(t.id), DAY(t.id), HOUR(t.id), MINUTE(t.id), SECOND(t.id) FROM table AS t";

            var selector = new TopLevelSelector("table");
            var from = new From(selector, "t");
            var expectedSelect = new Select(from);
            var columnHeader = new ColumnHeader("t", "id");
            expectedSelect.AddRowTransformer(new CellsRowTransformer(new[]
            {
                new CellTransformer(new ChainedCellExpression(new GetLengthCellExpression(columnHeader),
                    new ChainedCellExpression(new GetTrimmedCellExpression(columnHeader),
                        new GetOriginalCellCellExpression(columnHeader))), ""),
                new CellTransformer(
                    new ChainedCellExpression(new GetDatePartCellExpression(columnHeader, DatePart.Year),
                        new GetOriginalCellCellExpression(columnHeader)), ""),
                new CellTransformer(
                    new ChainedCellExpression(new GetDatePartCellExpression(columnHeader, DatePart.Month),
                        new GetOriginalCellCellExpression(columnHeader)), ""),
                new CellTransformer(
                    new ChainedCellExpression(new GetDatePartCellExpression(columnHeader, DatePart.Day),
                        new GetOriginalCellCellExpression(columnHeader)), ""),
                new CellTransformer(
                    new ChainedCellExpression(new GetDatePartCellExpression(columnHeader, DatePart.Hour),
                        new GetOriginalCellCellExpression(columnHeader)), ""),
                new CellTransformer(
                    new ChainedCellExpression(new GetDatePartCellExpression(columnHeader, DatePart.Minute),
                        new GetOriginalCellCellExpression(columnHeader)), ""),
                new CellTransformer(
                    new ChainedCellExpression(new GetDatePartCellExpression(columnHeader, DatePart.Second),
                        new GetOriginalCellCellExpression(columnHeader)), "")
            }));

            var expectedMessages = new string[0];

            // Act
            // Assert
            PerformTestWithAssert(query, expectedSelect, expectedMessages);
        }

        [Test]
        public void Parse_Select_SelectColumnsWithAlias()
        {
            // Arrange
            const string query = @"SELECT t.id AS ajdik, t.name as nejm FROM table AS t";

            var selector = new TopLevelSelector("table");
            var from = new From(selector, "t");
            var expectedSelect = new Select(from);
            expectedSelect.AddRowTransformer(new CellsRowTransformer(new ICellTransformer[]
            {
                new CellTransformer(new GetOriginalCellCellExpression(new ColumnHeader("t", "id")), "ajdik"),
                new CellTransformer(new GetOriginalCellCellExpression(new ColumnHeader("t", "name")), "nejm")
            }));

            var expectedMessages = new string[0];

            // Act
            // Assert
            PerformTestWithAssert(query, expectedSelect, expectedMessages);
        }

        [Test]
        public void Parse_Select_SelectManyColumns()
        {
            // Arrange
            const string query = @"SELECT t.id, t.name FROM table AS t";

            var selector = new TopLevelSelector("table");
            var from = new From(selector, "t");
            var expectedSelect = new Select(from);
            expectedSelect.AddRowTransformer(new CellsRowTransformer(new ICellTransformer[]
            {
                new CellTransformer(new GetOriginalCellCellExpression(new ColumnHeader("t", "id"))),
                new CellTransformer(new GetOriginalCellCellExpression(new ColumnHeader("t", "name")))
            }));

            var expectedMessages = new string[0];

            // Act
            // Assert
            PerformTestWithAssert(query, expectedSelect, expectedMessages);
        }

        [Test]
        public void Parse_Select_SelectSingleColumn()
        {
            // Arrange
            const string query = @"SELECT t.id FROM table AS t";

            var selector = new TopLevelSelector("table");
            var from = new From(selector, "t");
            var expectedSelect = new Select(from);
            expectedSelect.AddRowTransformer(new CellsRowTransformer(new ICellTransformer[]
            {
                new CellTransformer(new GetOriginalCellCellExpression(new ColumnHeader("t", "id")))
            }));

            var expectedMessages = new string[0];

            // Act
            // Assert
            PerformTestWithAssert(query, expectedSelect, expectedMessages);
        }

        [Test]
        public void Parse_Where_AndOperator()
        {
            // Arrange
            const string query = @"SELECT * FROM table AS t where t.id = '5' AND t.name = 'JAN'";

            var selector = new TopLevelSelector("table");
            var from = new From(selector, "t");
            var predicate =
                new AndPredicate(
                    new EqualPredicate(new GetCellRowExpression(new ColumnHeader("t", "id")),
                        new ConstantRowExpression("5")),
                    new EqualPredicate(new GetCellRowExpression(new ColumnHeader("t", "name")),
                        new ConstantRowExpression("JAN")));
            var where = new Where(from, predicate);
            var expectedSelect = new Select(where);
            expectedSelect.AddRowTransformer(new IdentityRowTransformer());

            var expectedMessages = new string[0];

            // Act
            // Assert
            PerformTestWithAssert(query, expectedSelect, expectedMessages);
        }

        [Test]
        public void Parse_Where_AndOrOperator()
        {
            // Arrange
            const string query = @"SELECT * FROM table AS t where t.id = '5' AND t.name = 'JAN' OR t.name = 'GERWAZY' ";

            var selector = new TopLevelSelector("table");
            var from = new From(selector, "t");
            var predicate = new OrPredicate(
                new AndPredicate(
                    new EqualPredicate(new GetCellRowExpression(new ColumnHeader("t", "id")),
                        new ConstantRowExpression("5")),
                    new EqualPredicate(new GetCellRowExpression(new ColumnHeader("t", "name")),
                        new ConstantRowExpression("JAN"))),
                new EqualPredicate(new GetCellRowExpression(new ColumnHeader("t", "name")),
                    new ConstantRowExpression("GERWAZY")));
            var where = new Where(from, predicate);
            var expectedSelect = new Select(where);
            expectedSelect.AddRowTransformer(new IdentityRowTransformer());

            var expectedMessages = new string[0];

            // Act
            // Assert
            PerformTestWithAssert(query, expectedSelect, expectedMessages);
        }

        [Test]
        public void Parse_Where_EqualOperator()
        {
            // Arrange
            const string query = @"SELECT * FROM table AS t where t.id = '5'";

            var selector = new TopLevelSelector("table");
            var from = new From(selector, "t");
            var predicate = new EqualPredicate(new GetCellRowExpression(new ColumnHeader("t", "id")),
                new ConstantRowExpression("5"));
            var where = new Where(from, predicate);
            var expectedSelect = new Select(where);
            expectedSelect.AddRowTransformer(new IdentityRowTransformer());

            var expectedMessages = new string[0];

            // Act
            // Assert
            PerformTestWithAssert(query, expectedSelect, expectedMessages);
        }

        [Test]
        public void Parse_Where_GreaterEqualOperator()
        {
            // Arrange
            const string query = @"SELECT * FROM table AS t where t.id >= '5'";

            var selector = new TopLevelSelector("table");
            var from = new From(selector, "t");
            var predicate = new GreaterEqualPredicate(new GetCellRowExpression(new ColumnHeader("t", "id")),
                new ConstantRowExpression("5"));
            var where = new Where(from, predicate);
            var expectedSelect = new Select(where);
            expectedSelect.AddRowTransformer(new IdentityRowTransformer());

            var expectedMessages = new string[0];

            // Act
            // Assert
            PerformTestWithAssert(query, expectedSelect, expectedMessages);
        }

        [Test]
        public void Parse_Where_GreaterThanOperator()
        {
            // Arrange
            const string query = @"SELECT * FROM table AS t where t.id > '5'";

            var selector = new TopLevelSelector("table");
            var from = new From(selector, "t");
            var predicate = new GreaterThanPredicate(new GetCellRowExpression(new ColumnHeader("t", "id")),
                new ConstantRowExpression("5"));
            var where = new Where(from, predicate);
            var expectedSelect = new Select(where);
            expectedSelect.AddRowTransformer(new IdentityRowTransformer());

            var expectedMessages = new string[0];

            // Act
            // Assert
            PerformTestWithAssert(query, expectedSelect, expectedMessages);
        }

        [Test]
        public void Parse_Where_IsNullOperator()
        {
            // Arrange
            const string query = @"SELECT * FROM table AS t where t.id IS NULL";

            var selector = new TopLevelSelector("table");
            var from = new From(selector, "t");
            var predicate = new IsNullPredicate(new GetCellRowExpression(new ColumnHeader("t", "id")));
            var where = new Where(from, predicate);
            var expectedSelect = new Select(where);
            expectedSelect.AddRowTransformer(new IdentityRowTransformer());

            var expectedMessages = new string[0];

            // Act
            // Assert
            PerformTestWithAssert(query, expectedSelect, expectedMessages);
        }

        [Test]
        public void Parse_Where_LessEqualOperator()
        {
            // Arrange
            const string query = @"SELECT * FROM table AS t where t.id <= '5'";

            var selector = new TopLevelSelector("table");
            var from = new From(selector, "t");
            var predicate = new LessEqualPredicate(new GetCellRowExpression(new ColumnHeader("t", "id")),
                new ConstantRowExpression("5"));
            var where = new Where(from, predicate);
            var expectedSelect = new Select(where);
            expectedSelect.AddRowTransformer(new IdentityRowTransformer());

            var expectedMessages = new string[0];

            // Act
            // Assert
            PerformTestWithAssert(query, expectedSelect, expectedMessages);
        }

        [Test]
        public void Parse_Where_LessThanOperator()
        {
            // Arrange
            const string query = @"SELECT * FROM table AS t where t.id < '5'";

            var selector = new TopLevelSelector("table");
            var from = new From(selector, "t");
            var predicate = new LessThanPredicate(new GetCellRowExpression(new ColumnHeader("t", "id")),
                new ConstantRowExpression("5"));
            var where = new Where(from, predicate);
            var expectedSelect = new Select(where);
            expectedSelect.AddRowTransformer(new IdentityRowTransformer());

            var expectedMessages = new string[0];

            // Act
            // Assert
            PerformTestWithAssert(query, expectedSelect, expectedMessages);
        }

        [Test]
        public void Parse_Where_LikeOperatorWithEscapeCharacter()
        {
            // Arrange
            const string query = @"SELECT * FROM table AS t where t.id LIKE 'abcd_%[a-z][0-9][^0-9]' ESCAPE '&'";

            var selector = new TopLevelSelector("table");
            var from = new From(selector, "t");
            var predicate = new LikePredicate(new GetCellRowExpression(new ColumnHeader("t", "id")),
                new ConstantRowExpression("abcd_%[a-z][0-9][^0-9]"), '&');
            var where = new Where(from, predicate);
            var expectedSelect = new Select(where);
            expectedSelect.AddRowTransformer(new IdentityRowTransformer());

            var expectedMessages = new string[0];

            // Act
            // Assert
            PerformTestWithAssert(query, expectedSelect, expectedMessages);
        }

        [Test]
        public void Parse_Where_LikeOperatorWithoutEscapeCharacter()
        {
            // Arrange
            const string query = @"SELECT * FROM table AS t where t.id LIKE 'abcd_%[a-z][^0-9]'";

            var selector = new TopLevelSelector("table");
            var from = new From(selector, "t");
            var predicate = new LikePredicate(new GetCellRowExpression(new ColumnHeader("t", "id")),
                new ConstantRowExpression("abcd_%[a-z][^0-9]"), '\0');
            var where = new Where(from, predicate);
            var expectedSelect = new Select(where);
            expectedSelect.AddRowTransformer(new IdentityRowTransformer());

            var expectedMessages = new string[0];

            // Act
            // Assert
            PerformTestWithAssert(query, expectedSelect, expectedMessages);
        }

        [Test]
        public void Parse_Where_NotEqualOperator()
        {
            // Arrange
            const string query = @"SELECT * FROM table AS t where t.id <> '5'";

            var selector = new TopLevelSelector("table");
            var from = new From(selector, "t");
            var predicate = new NotEqualPredicate(new GetCellRowExpression(new ColumnHeader("t", "id")),
                new ConstantRowExpression("5"));
            var where = new Where(from, predicate);
            var expectedSelect = new Select(where);
            expectedSelect.AddRowTransformer(new IdentityRowTransformer());

            var expectedMessages = new string[0];

            // Act
            // Assert
            PerformTestWithAssert(query, expectedSelect, expectedMessages);
        }

        [Test]
        public void Parse_Where_NotOperator()
        {
            // Arrange
            const string query = @"SELECT * FROM table AS t where NOT (t.id >= '5')";

            var selector = new TopLevelSelector("table");
            var from = new From(selector, "t");
            var predicate =
                new NotPredicate(new GreaterEqualPredicate(new GetCellRowExpression(new ColumnHeader("t", "id")),
                    new ConstantRowExpression("5")));
            var where = new Where(from, predicate);
            var expectedSelect = new Select(where);
            expectedSelect.AddRowTransformer(new IdentityRowTransformer());

            var expectedMessages = new string[0];

            // Act
            // Assert
            PerformTestWithAssert(query, expectedSelect, expectedMessages);
        }

        [Test]
        public void Parse_Where_OrAndOperator()
        {
            // Arrange
            const string query = @"SELECT * FROM table AS t where t.name = 'GERWAZY' OR t.name = 'JAN' AND t.id = '5'";

            var selector = new TopLevelSelector("table");
            var from = new From(selector, "t");
            var predicate = new OrPredicate(new EqualPredicate(new GetCellRowExpression(new ColumnHeader("t", "name")),
                new ConstantRowExpression("GERWAZY")),
                new AndPredicate(
                    new EqualPredicate(new GetCellRowExpression(new ColumnHeader("t", "name")),
                        new ConstantRowExpression("JAN")),
                    new EqualPredicate(new GetCellRowExpression(new ColumnHeader("t", "id")),
                        new ConstantRowExpression("5"))));
            var where = new Where(from, predicate);
            var expectedSelect = new Select(where);
            expectedSelect.AddRowTransformer(new IdentityRowTransformer());

            var expectedMessages = new string[0];

            // Act
            // Assert
            PerformTestWithAssert(query, expectedSelect, expectedMessages);
        }

        [Test]
        public void Parse_Where_OrOperator()
        {
            // Arrange
            const string query = @"SELECT * FROM table AS t where t.id = '5' OR t.name = 'JAN'";

            var selector = new TopLevelSelector("table");
            var from = new From(selector, "t");
            var predicate =
                new OrPredicate(
                    new EqualPredicate(new GetCellRowExpression(new ColumnHeader("t", "id")),
                        new ConstantRowExpression("5")),
                    new EqualPredicate(new GetCellRowExpression(new ColumnHeader("t", "name")),
                        new ConstantRowExpression("JAN")));
            var where = new Where(from, predicate);
            var expectedSelect = new Select(where);
            expectedSelect.AddRowTransformer(new IdentityRowTransformer());

            var expectedMessages = new string[0];

            // Act
            // Assert
            PerformTestWithAssert(query, expectedSelect, expectedMessages);
        }

        [Test]
        public void Parse_Where_Parenthesis()
        {
            // Arrange
            const string query = @"SELECT * FROM table AS t where t.id = '5' AND (t.name = 'JAN' OR t.name = 'GERWAZY')";

            var selector = new TopLevelSelector("table");
            var from = new From(selector, "t");
            var predicate = new AndPredicate(new EqualPredicate(new GetCellRowExpression(new ColumnHeader("t", "id")),
                new ConstantRowExpression("5")),
                new OrPredicate(
                    new EqualPredicate(new GetCellRowExpression(new ColumnHeader("t", "name")),
                        new ConstantRowExpression("JAN")),
                    new EqualPredicate(new GetCellRowExpression(new ColumnHeader("t", "name")),
                        new ConstantRowExpression("GERWAZY"))));
            var where = new Where(from, predicate);
            var expectedSelect = new Select(where);
            expectedSelect.AddRowTransformer(new IdentityRowTransformer());

            var expectedMessages = new string[0];

            // Act
            // Assert
            PerformTestWithAssert(query, expectedSelect, expectedMessages);
        }
    }
}

If you followed this series then this whole code should be very easy.

Summary

We now have engine for parsing XML documents, transforming them, and executing queries. We can easily transform rows, perform set operations, implement functions. There are things in SQL-92 which we miss, e.g., nested queries. However, we can easily extend our language and engine to handle them. For now the job is done but if you want you can easily extend the SQLxD database even more.

]]>
https://blog.adamfurmanek.pl/2017/10/21/sqlxd-part-23/feed/ 0
SQLxD Part 22 — Query parser https://blog.adamfurmanek.pl/2017/10/14/sqlxd-part-22/ https://blog.adamfurmanek.pl/2017/10/14/sqlxd-part-22/#respond Sat, 14 Oct 2017 08:00:48 +0000 https://blog.adamfurmanek.pl/?p=2217 Continue reading SQLxD Part 22 — Query parser]]>

This is the twenty second part of the SQLxD series. For your convenience you can find other parts in the table of contents in Part 1 – XML Transformation

For parsing queries we use IronPython. Let’s begin with C# code for executing PLY engine:

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Reflection;
using IronPython.Hosting;
using IronPython.Modules;
using Microsoft.Scripting.Hosting;
using QueryLogic.Filtering;
using QueryLogic.Joins.Abstraction;
using QueryLogic.RelationProviding;
using QueryLogic.Selecting;
using QueryLogic.Selectors;

namespace QueryParser.Parsing
{
    public class Parser
    {
        private static dynamic _ipy;
        private Dictionary<string, ISelector> _aliasedSelectors;

        public Parser()
        {
            _ipy = _ipy ?? CreateEngine();
        }

        private dynamic CreateEngine()
        {
            ScriptRuntimeSetup setup = Python.CreateRuntimeSetup(GetRuntimeOptions());
            var pyRuntime = new ScriptRuntime(setup);
            ScriptEngine engineInstance = Python.GetEngine(pyRuntime);

            AddPythonLibrariesToSysMetaPath(engineInstance);

            dynamic ipy = pyRuntime.UseFile(@"Parsing\Parser.py");
            ipy.initialize(GetPlyPath());

            return ipy;
        }

        private void AddPythonLibrariesToSysMetaPath(ScriptEngine engineInstance)
        {
            Assembly asm = GetType().Assembly;
            IEnumerable<string> resQuery =
                from name in asm.GetManifestResourceNames()
                where name.ToLowerInvariant().EndsWith("python27.zip")
                select name;
            string resName = resQuery.Single();
            var importer = new ResourceMetaPathImporter(asm, resName);
            dynamic sys = engineInstance.GetSysModule();
            sys.meta_path.append(importer);
            sys.path.append(importer);
        }

        private static Dictionary<string, object> GetRuntimeOptions()
        {
            var options = new Dictionary<string, object>();
            options["Debug"] = false;
            return options;
        }

        private static string GetPlyPath()
        {
            return Path.Combine(Environment.CurrentDirectory, "Lib", "ply");
        }

        public Tuple<Select, string[]> Parse(string content)
        {
            Tuple<Select, string[]> result;
            try
            {
                result = _ipy.parse(content);
            }
            catch (Exception e)
            {
                return Tuple.Create<Select, string[]>(null, new[]
                {
                    string.Format("{0}\n{1}\n{2}",e.Message, e.InnerException, e.StackTrace)
                });
            }
            FixSelectors(result.Item1);
            result = Tuple.Create(result.Item1, result.Item2);

            return result;
        }

        private void FixSelectors(Select result)
        {
            if (result == null)
            {
                return;
            }

            _aliasedSelectors = new Dictionary<string, ISelector>();
            IRelationProvider sourceProvider = result.GroupBy.SourceRelationProvider;
            FixSelectorInSourceRelationProvider(sourceProvider);
        }

        private void FixSelectorInSourceRelationProvider(IRelationProvider sourceRelationProvider)
        {
            if (sourceRelationProvider is From)
            {
                FixSelectorInFrom(sourceRelationProvider as From);
            }
            else if (sourceRelationProvider is IJoin)
            {
                FixSelectorInJoin(sourceRelationProvider as IJoin);
            }
            else if (sourceRelationProvider is Where)
            {
                FixSelectorInWhere(sourceRelationProvider as Where);
            }
        }

        private void FixSelectorInWhere(Where @where)
        {
            FixSelectorInSourceRelationProvider(@where.SourceRelation);
        }

        private void FixSelectorInJoin(IJoin @join)
        {
            FixSelectorInSourceRelationProvider(@join.FirstRelation);
            FixSelectorInSourceRelationProvider(@join.SecondRelation);
        }

        private void FixSelectorInFrom(From sourceProvider)
        {
            ISelector selector = sourceProvider.Selector;
            if (selector is NodeSelector)
            {
                sourceProvider.Selector = GetRealSelectorForNodeSelector(selector);
            }
            else if (selector is ChainedSelector)
            {
                sourceProvider.Selector = FixChainedSelector(selector as ChainedSelector);
            }

            _aliasedSelectors.Add(sourceProvider.Alias, sourceProvider.Selector);
        }

        private ChainedSelector FixChainedSelector(ChainedSelector chainedSelector)
        {
            return new ChainedSelector(GetRealSelectorForNodeSelector(chainedSelector.First), chainedSelector.Second);
        }

        private ISelector GetRealSelectorForNodeSelector(ISelector selector)
        {
            var nodeSelector = selector as NodeSelector;
            if (nodeSelector == null)
            {
                return selector;
            }

            ISelector aliasedSelector;
            _aliasedSelectors.TryGetValue(nodeSelector.Name, out aliasedSelector);

            return aliasedSelector ?? new TopLevelSelector(nodeSelector.Name);
        }
    }
}

And now goes the parser:

# -*- coding: utf-8 -*- 

def makeCaseInsensitive(s):
    return ''.join(sum(map(lambda c: ["[", c.lower(), c.upper(), "]"], s), []))

keywords = {
   'SELECT' : 'SELECT',
   'DISTINCT' : 'DISTINCT',
   'ALL' : 'ALL',
   'AS' : 'AS',
   'FROM' : 'FROM',
   'CROSS' : 'CROSS',
   'JOIN' : 'JOIN',
   'NATURAL' : 'NATURAL',
   'INNER' : 'INNER',
   'LEFT' : 'LEFT',
   'RIGHT' : 'RIGHT',
   'FULL' : 'FULL',
   'OUTER' : 'OUTER',
   'OR' : 'OR',
   'AND' : 'AND',
   'NOT' : 'NOT',
   'LIKE' : 'LIKE',
   'ESCAPE' : 'ESCAPE',
   'ON' : 'ON',
   'COUNT' : 'COUNT',
   'AVG' : 'AVG',
   'MAX' : 'MAX',
   'MIN' : 'MIN',
   'SUM' : 'SUM',
   'WHERE' : 'WHERE',
   'IS' : 'IS',
   'NULL' : 'NULL',
   'GROUP' : 'GROUP',
   'ORDER' : 'ORDER',
   'BY' : 'BY',
   'DESC' : 'DESC',
   'SKIP' : 'SKIP',
   'FETCH' : 'FETCH',
   'LEN' : 'LEN',
   'TRIM' : 'TRIM',
   'SUBSTR' : 'SUBSTR',
   'YEAR' : 'YEAR',
   'MONTH' : 'MONTH',
   'DAY' : 'DAY',
   'HOUR' : 'HOUR',
   'MINUTE' : 'MINUTE',
   'SECOND' : 'SECOND',
   'CATENATE' : 'CATENATE',
}

tokens = [
    'IDENTIFIER',
    'ASTERISK',
    'PERIOD',
    'QUESTION_MARK',
    'LEFT_PAREN',
    'RIGHT_PAREN',
    'EQUALS_OPERATOR',
    'NOT_EQUALS_OPERATOR',
    'LESS_THAN_OPERATOR',
    'GREATER_THAN_OPERATOR',
    'LESS_THAN_OR_EQUALS_OPERATOR',
    'GREATER_THAN_OR_EQUALS_OPERATOR',
    'QUOTTED_STRING',
    'COMMA',
    'NUMBER',
] + list(keywords.values())

errors = []

precedence = (
        ("right", 'NATURAL'),
)

t_ignore = ' \t\n\r'

def t_IDENTIFIER(t):
    r"[#a-zA-Z][#a-zA-Z0-9_]*"
    t.type = keywords.get(t.value.upper(),'IDENTIFIER') 
    return t

def t_ASTERISK(t):
    r"[*]"
    return t

def t_PERIOD(t):
    r"[.]"
    return t

def t_QUESTION_MARK(t):
    r"[?]"
    return t

def t_LEFT_PAREN(t):
    r"[(]"
    return t

def t_RIGHT_PAREN(t):
    r"[)]"
    return t

def t_EQUALS_OPERATOR(t):
    r"[=]"
    return t

def t_NOT_EQUALS_OPERATOR(t):
    r"[< ][>]"
    return t

def t_LESS_THAN_OR_EQUALS_OPERATOR(t):
    r"[< ][=]"
    return t

def t_GREATER_THAN_OR_EQUALS_OPERATOR(t):
    r"[>][=]"
    return t

def t_LESS_THAN_OPERATOR(t):
    r"[< ]"
    return t

def t_GREATER_THAN_OPERATOR(t):
    r"[>]"
    return t

def t_COMMA(t):
    r"[,]"
    return t

def t_NUMBER(t):
    r"[0-9]+"
    return t

def t_QUOTTED_STRING(t):
    r"['][^']*?[']"
    t.value = t.value[1:-1]
    return t


def t_error(t):
    global errors
    errors.append("Line {0:3}:\tIllegal character '{1}'".format(t.lexer.lineno, t.value[0]))
    t.lexer.skip(1)

def p_error(p):
    global errors
    if p:
        errors.append("Line {0:3}:\tSyntax error - unexpected '{1}' ".format(p.lineno, str(p.value)))
    else:
        errors.append("Syntax error - unexpected EOF ")
    print errors

def p_start(p):
    """start : querySpecification"""
    p[0] = p[1]


#< query specification>    ::= 
#          SELECT  [ < set quantifier> ] < select list> < table expression>
def p_querySpecification_without_set_quantifier(p):
    """querySpecification : SELECT selectList tableExpression"""
    p[0] = QueryLogic.Selecting.Select(p[3][0], p[3][1])
    p[0].AddRowTransformer(p[2][0])
    for aggregate in p[2][1]:
        p[0].AddAggregate(aggregate)

def p_querySpecification_with_set_quantifier(p):
    """querySpecification : SELECT setQuantifier selectList tableExpression"""
    pass

#< set quantifier>    ::=    DISTINCT  |  ALL
def p_setQuantifier_distinct(p):
    """setQuantifier : DISTINCT"""
    pass

def p_setQuantifier_all(p):
    """setQuantifier : ALL"""
    pass

#< select list>    ::= 
#         < asterisk>
#     |     < select sublist> [ { < comma> < select sublist> }... ]
def p_selectList_asterisk(p):
    """selectList : ASTERISK"""
    p[0] = (QueryLogic.Transformers.RowTransformers.IdentityRowTransformer(), [])

def p_selectList_selectListSelectSublist(p):
    """selectList : selectListSelectSublist"""
    cellTransformers = filter(lambda c: isinstance(c, QueryLogic.Transformers.CellTransformers.ICellTransformer), p[1])
    rowTransformer = QueryLogic.Transformers.RowTransformers.CellsRowTransformer(System.Array[QueryLogic.Transformers.CellTransformers.ICellTransformer](cellTransformers))
    aggregates = filter(lambda c: isinstance(c, QueryLogic.Aggregates.IAggregate), p[1])
    p[0] = (rowTransformer, aggregates)

def p_selectListSelectSublist_first(p):
    """selectListSelectSublist : selectSublist"""
    p[0] = p[1]

def p_selectListSelectSublist_next(p):
    """selectListSelectSublist : selectSublist COMMA selectListSelectSublist"""
    p[0] = p[1] + p[3]

#< select sublist>    ::=   < derived column> 
def p_selectSublist(p):
    """selectSublist : derivedColumn"""
    p[0] = [p[1]]

#< derived column>    ::=   < value expression> [ < as clause> ]
def p_derivedColumn_without_as_clause(p):
    """derivedColumn : valueExpression"""
    if isinstance(p[1], Model.ColumnHeader):
        p[0] = QueryLogic.Transformers.CellTransformers.CellTransformer(QueryLogic.Expressions.CellExpressions.GetOriginalCellCellExpression(p[1]))
    else:
        #p[0] = createAggregate(p[1], '')
        p[0] = parseFunctionOrAggregateStack(p[1], '')

def p_derivedColumn_with_as_clause(p):
    """derivedColumn : valueExpression asClause"""
    if isinstance(p[1], Model.ColumnHeader):
        p[0] = QueryLogic.Transformers.CellTransformers.CellTransformer(QueryLogic.Expressions.CellExpressions.GetOriginalCellCellExpression(p[1]), p[2])
    else:
        #p[0] = createAggregate(p[1], p[2])
        p[0] = parseFunctionOrAggregateStack(p[1], p[2])

def parseFunctionOrAggregateStack(stack, alias):
    innerFunctions, outerFunctions, aggregate = parseWholeStack(stack, [], [], None)

    if aggregate is None:
        expression = CreateChainedCellExpression(innerFunctions)
        result = QueryLogic.Transformers.CellTransformers.CellTransformer(expression, alias)
    elif aggregate == 'COUNT_*':
            result = QueryLogic.Aggregates.CountAggregate(QueryLogic.Expressions.CellExpressions.GetOriginalCellCellExpression(Model.ColumnHeader("", alias)), alias)
    else:
        innerExpression = CreateChainedCellExpression(innerFunctions)
        outerExpression = CreateChainedCellExpression(outerFunctions + [QueryLogic.Expressions.CellExpressions.GetOriginalCellCellExpression(Model.ColumnHeader("", alias))])
        if aggregate == 'AVG':
            result = QueryLogic.Aggregates.AverageAggregate(innerExpression, outerExpression, alias)
        elif aggregate == 'MAX':
            result = QueryLogic.Aggregates.MaximumAggregate(innerExpression, outerExpression, alias)
        elif aggregate == 'MIN':
            result = QueryLogic.Aggregates.MinimumAggregate(innerExpression, outerExpression, alias)
        elif aggregate == 'SUM':
            result = QueryLogic.Aggregates.SumAggregate(innerExpression, outerExpression, alias)
        elif aggregate == 'COUNT':
            result = QueryLogic.Aggregates.CountColumnAggregate(innerExpression, outerExpression, alias)
        else:
            result = None

    return result

def CreateChainedCellExpression(functions):
    if len(functions) > 1:
        selected = functions[0]
        if selected == 'LEN':
            inner = CreateChainedCellExpression(functions[1:])
            return QueryLogic.Expressions.CellExpressions.ChainedCellExpression(QueryLogic.Expressions.CellExpressions.GetLengthCellExpression(inner.Source), inner)
        elif selected == 'TRIM':
            inner = CreateChainedCellExpression(functions[1:])
            return QueryLogic.Expressions.CellExpressions.ChainedCellExpression(QueryLogic.Expressions.CellExpressions.GetTrimmedCellExpression(inner.Source), inner)
        elif selected == 'YEAR':
            inner = CreateChainedCellExpression(functions[1:])
            return QueryLogic.Expressions.CellExpressions.ChainedCellExpression(QueryLogic.Expressions.CellExpressions.GetDatePartCellExpression(inner.Source, QueryLogic.Expressions.CellExpressions.DatePart.Year), inner)
        elif selected == 'MONTH':
            inner = CreateChainedCellExpression(functions[1:])
            return QueryLogic.Expressions.CellExpressions.ChainedCellExpression(QueryLogic.Expressions.CellExpressions.GetDatePartCellExpression(inner.Source, QueryLogic.Expressions.CellExpressions.DatePart.Month), inner)
        elif selected == 'DAY':
            inner = CreateChainedCellExpression(functions[1:])
            return QueryLogic.Expressions.CellExpressions.ChainedCellExpression(QueryLogic.Expressions.CellExpressions.GetDatePartCellExpression(inner.Source, QueryLogic.Expressions.CellExpressions.DatePart.Day), inner)
        elif selected == 'HOUR':
            inner = CreateChainedCellExpression(functions[1:])
            return QueryLogic.Expressions.CellExpressions.ChainedCellExpression(QueryLogic.Expressions.CellExpressions.GetDatePartCellExpression(inner.Source, QueryLogic.Expressions.CellExpressions.DatePart.Hour), inner)
        elif selected == 'MINUTE':
            inner = CreateChainedCellExpression(functions[1:])
            return QueryLogic.Expressions.CellExpressions.ChainedCellExpression(QueryLogic.Expressions.CellExpressions.GetDatePartCellExpression(inner.Source, QueryLogic.Expressions.CellExpressions.DatePart.Minute), inner)
        elif selected == 'SECOND':
            inner = CreateChainedCellExpression(functions[1:])
            return QueryLogic.Expressions.CellExpressions.ChainedCellExpression(QueryLogic.Expressions.CellExpressions.GetDatePartCellExpression(inner.Source, QueryLogic.Expressions.CellExpressions.DatePart.Second), inner)

        elif type(selected) is tuple:
            if selected[0] == 'SUBSTR':
                inner = CreateChainedCellExpression(functions[1:])
                return QueryLogic.Expressions.CellExpressions.ChainedCellExpression(QueryLogic.Expressions.CellExpressions.GetSubstringCellExpression(inner.Source, selected[1], selected[2]), inner)
            if selected[0] == 'CATENATE':
                inner = CreateChainedCellExpression(functions[1:])
                return QueryLogic.Expressions.CellExpressions.ChainedCellExpression(QueryLogic.Expressions.CellExpressions.CatenateStringCellExpression(inner.Source, selected[1]), inner)
        else:
            return None
    else:
        return functions[0]

    return None

def parseWholeStack(stack, innerFunctions, outerFunctions, aggregate):
    if isinstance(stack, Model.ColumnHeader):
        return (innerFunctions + [QueryLogic.Expressions.CellExpressions.GetOriginalCellCellExpression(stack)], outerFunctions, aggregate)

    function, parameters = stack
    if aggregate is None:
        if function == 'AVG' or function == 'MAX' or function == 'MIN' or function == 'SUM' or function == 'COUNT':
            return parseInnerFunctions(parameters[0], [], innerFunctions, function)
        elif function == 'COUNT_*':
            return (innerFunctions, outerFunctions, function)
        
    if function == 'LEN' or function == 'TRIM' or function == 'YEAR' or function == 'MONTH' or function == 'DAY' or function == 'HOUR' or function == 'MINUTE' or function == 'SECOND':
        return parseWholeStack(parameters[0], innerFunctions + [function], outerFunctions, aggregate)
    elif function == 'SUBSTR':
        return parseWholeStack(parameters[0], innerFunctions + [(function, int(parameters[1]), int(parameters[2]))], outerFunctions, aggregate)
    elif function == 'CATENATE':
        return parseWholeStack(parameters[0], innerFunctions + [(function, parameters[1])], outerFunctions, aggregate)
    else:
        return None

def parseInnerFunctions(stack, innerFunctions, outerFunctions, aggregate):
    if isinstance(stack, Model.ColumnHeader):
        return (innerFunctions + [QueryLogic.Expressions.CellExpressions.GetOriginalCellCellExpression(stack)], outerFunctions, aggregate)
    
    function, parameters = stack
    if function == 'LEN' or function == 'TRIM' or function == 'YEAR' or function == 'MONTH' or function == 'DAY' or function == 'HOUR' or function == 'MINUTE' or function == 'SECOND':
        return parseInnerFunctions(parameters[0], innerFunctions + [function], outerFunctions, aggregate)
    elif function == 'SUBSTR':
        return parseInnerFunctions(parameters[0], innerFunctions + [(function, int(parameters[1]), int(parameters[2]))], outerFunctions, aggregate)
    elif function == 'CATENATE':
        return parseInnerFunctions(parameters[0], innerFunctions + [(function, parameters[1])], outerFunctions, aggregate)
    else:
        return None

#< value expression>    ::=      < string value expression>
def p_valueExpression_stringValueExpression(p):
    """valueExpression : stringValueExpression"""
    p[0] = p[1]

#< string value expression>    ::=   < character value expression> | < bit value expression>
def p_stringValueExpression_characterValueExpression(p):
    """stringValueExpression : characterValueExpression"""
    p[0] = p[1]

#< character value expression>    ::=   < character factor>
def p_characterValueExpression_characterFactor(p):
    """characterValueExpression : characterFactor"""
    p[0] = p[1]

#< character factor>    ::=   < character primary>
def p_characterFactor_characterPrimary(p):
    """characterFactor : characterPrimary"""
    p[0] = p[1]

#< character primary>    ::=   < value expression primary>
def p_characterPrimary_valueExpression_Primary(p):
    """characterPrimary : valueExpressionPrimary"""
    p[0] = p[1]

#< value expression primary>    ::= 
#       < unsigned value specification>
#   | < column reference>
#   | < set function specification>
def p_valueExpressionPrimary_unsignedValueSpecification(p):
    """valueExpressionPrimary : unsignedValueSpecification"""
    p[0] = p[1]

def p_valueExpressionPrimary_columnReference(p):
    """valueExpressionPrimary : columnReference"""
    p[0] = p[1]

def p_valueExpressionPrimary_setFunctionSpecification(p):
    """valueExpressionPrimary : setFunctionSpecification"""
    p[0] = p[1]

#< set function specification>    ::= 
#          COUNT < left paren> < asterisk> < right paren>
#     | < general set function>
#< general set function>    ::= 
#         < set function type> < left paren> < value expression> < right paren>
def p_setFunctionSpecification_generalSetFunction(p):
    """setFunctionSpecification : generalSetFunction"""
    p[0] = p[1]

def p_generalSetFunction_valueExpression(p):
    """generalSetFunction : setFunctionType LEFT_PAREN manyValueExpressions RIGHT_PAREN"""
    p[0] = (p[1], p[3])

def p_manyValueExpressions_first(p):
    """manyValueExpressions : valueExpression"""
    p[0] = [p[1]]

def p_manyValueExpressions_mext(p):
    """manyValueExpressions : manyValueExpressions COMMA valueExpression"""
    p[0] = p[1] + [p[3]]


def p_generalSetFunction_ASTERISK(p):
    """generalSetFunction : setFunctionType LEFT_PAREN ASTERISK RIGHT_PAREN"""
    p[0] = ('COUNT_*', p[3])

#< set function type>    ::=    AVG  |  MAX  |  MIN  |  SUM  |  COUNT
def p_setFunctionType(p):
    """setFunctionType : AVG  
        | MAX  
        | MIN  
        | SUM  
        | COUNT
        | LEN
        | TRIM
        | SUBSTR
        | YEAR
        | MONTH
        | DAY
        | HOUR
        | MINUTE
        | SECOND
        | CATENATE"""
    p[0] = p[1]

#< column reference>    ::=   < qualifier> < period> < column name> [ < period> < column name> ... ]
def p_columnReference_columnNames(p):
    """columnReference : qualifier PERIOD columnNames"""
    p[0] = Model.ColumnHeader(p[1], p[3])

def p_columnNames_first(p):
    """columnNames : columnName"""
    p[0] = p[1]

def p_columnNames_next(p):
    """columnNames : columnName PERIOD columnNames"""
    p[0] = p[1] + '.' + p[3]

#< column name>    ::=   < identifier>
def p_columnName(p):
    """columnName : identifier"""
    p[0] = p[1]

#< as clause>    ::=   [  AS  ] < column name>
def p_asClause_without_as_keyword(p):
    """asClause : columnName"""
    p[0] = p[1]

def p_asClause_with_as_keyword(p):
    """asClause : AS columnName"""
    p[0] = p[2]

#< unsigned value specification>    ::=   < unsigned literal>
def p_unsignedValueSpecification(p):
    """unsignedValueSpecification : unsignedLiteral"""
    p[0] = p[1]

#< unsigned literal>    ::=    < unsigned numeric literal> | < general literal>
def p_unsignedLiteral_generalLiteral(p):
    """unsignedLiteral : generalLiteral"""
    p[0] = p[1]

def p_unsignedLiteral_unsignedNumericLiteral(p):
    """unsignedLiteral : unsignedNumericLiteral"""
    p[0] = p[1]

#< unsigned numeric literal>    ::= 
#         < exact numeric literal>
def p_unsignedNumericLiteral_exactNumericLiteral(p):
    """unsignedNumericLiteral : exactNumericLiteral"""
    p[0] = p[1]

#< exact numeric literal>    ::= 
#         < unsigned integer>
def p_exactNumericLiteral_unsignedInteger(p):
    """exactNumericLiteral : unsignedInteger"""
    p[0] = p[1]

#< exact numeric literal>    ::= 
#         < unsigned integer>
def p_unsignedInteger(p):
    """unsignedInteger : NUMBER"""
    p[0] = p[1]

#< general literal>    ::= 
#         < character string literal>
def p_generalLiteral(p):
    """generalLiteral : characterStringLiteral"""
    p[0] = p[1]

#< character string literal>    ::= 
#     QUOTE [ < character representation> ... ] QUOTE
#< character representation>    ::=   < nonquote character>
def p_characterStringLiteral(p):
    """characterStringLiteral : QUOTTED_STRING"""
    p[0] = p[1]

#< identifier>    ::=   < actual identifier>
def p_identifier(p):
    """identifier : actualIdentifier"""
    p[0] = p[1]

#< actual identifier>    ::=   < regular identifier>
def p_actualIdentifier(p):
    """actualIdentifier : regularIdentifier"""
    p[0] = p[1]

#< regular identifier>    ::=   < identifier body>
def p_regularIdentifier(p):
    """regularIdentifier : identifierBody"""
    p[0] = p[1]

#< identifier body>    ::=   < identifier start> [ { < underscore> | < identifier part> } ... ]
#< identifier start>    ::=   !! See the Syntax rules
#< identifier part>    ::=   < identifier start> | < digit>
def p_identifierBody(p):
    """identifierBody : IDENTIFIER"""
    p[0] = p[1]

#< table expression>    ::= 
#         < from clause>
#         [ < where clause> ]
#         [ < group by clause> ]
#         [ < order by clause]
def p_tableExpression_only_fromClause(p):
    """tableExpression : fromClause"""
    p[0] = (p[1], None)

def p_tableExpression_fromClause_whereClause(p):
    """tableExpression : fromClause whereClause"""
    p[0] = (QueryLogic.Filtering.Where(p[1], p[2]), None)

def p_tableExpression_fromClause_groupByClause(p):
    """tableExpression : fromClause groupByClause"""
    p[0] = (QueryLogic.Grouping.GroupBy(p[1], p[2]), None)

def p_tableExpression_fromClause_whereClause_groupByClause(p):
    """tableExpression : fromClause whereClause groupByClause"""
    p[0] = (QueryLogic.Grouping.GroupBy(QueryLogic.Filtering.Where(p[1], p[2]), p[3]), None)

def p_tableExpression_fromClause_orderByClause(p):
    """tableExpression : fromClause orderByClause"""
    p[0] = (p[1], p[2])

def p_tableExpression_fromClause_groupByClause_orderByClause(p):
    """tableExpression : fromClause groupByClause orderByClause"""
    p[0] = (QueryLogic.Grouping.GroupBy(p[1], p[2]), p[3])

def p_tableExpression_fromClause_whereClause_orderByClause(p):
    """tableExpression : fromClause whereClause orderByClause"""
    p[0] = (QueryLogic.Filtering.Where(p[1], p[2]), p[3])

def p_tableExpression_fromClause_whereClause_groupByClause_orderByClause(p):
    """tableExpression : fromClause whereClause groupByClause orderByClause"""
    p[0] = (QueryLogic.Grouping.GroupBy(QueryLogic.Filtering.Where(p[1], p[2]), p[3]), p[4])

#< from clause>    ::=    FROM < table reference>
def p_fromClause(p):
    """fromClause : FROM tableReference"""
    p[0] = p[2]

#< table reference>    ::= 
#         < table name>  < correlation specification> 
#     | < joined table>
def p_tableReference_tableReferenceFrom(p):
    """tableReference : tableReferenceFrom"""
    p[0] = p[1]

def p_tableReference_joinedTable(p):
    """tableReference : joinedTable"""
    p[0] = p[1]

def p_tableReferenceFrom_tableName(p):
    """tableReferenceFrom : tableName correlationSpecification"""
    p[0] = QueryLogic.RelationProviding.From(p[1], p[2])


#< table name>    ::=   < qualified name>
def p_tableName(p):
    """tableName : qualifiedName"""
    p[0] = p[1]

#< qualified name>    ::=    < qualified identifier> [ PERIOD < qualified identifier> ...]
def p_qalifiedName(p):
    """qualifiedName : qualifiedIdentifiers"""
    p[0] = p[1]

def p_qalifiedName_first(p):
    """qualifiedIdentifiers : qualifiedIdentifier"""
    p[0] = p[1]

def p_qalifiedName_next(p):
    """qualifiedIdentifiers : qualifiedIdentifier PERIOD qualifiedIdentifiers"""
    p[0] = QueryLogic.Selectors.ChainedSelector(p[1], p[3])

#< qualified identifier>    ::=   < identifier> | QUESTION_MARK | ASTERISK
def p_qalifiedIdentifier_identifier(p):
    """qualifiedIdentifier : identifier"""
    p[0] = QueryLogic.Selectors.NodeSelector(p[1])

def p_qalifiedIdentifier_QUESTION_MARK(p):
    """qualifiedIdentifier : QUESTION_MARK"""
    p[0] = QueryLogic.Selectors.LevelSelector()

def p_qalifiedIdentifier_ASTERISK(p):
    """qualifiedIdentifier : ASTERISK"""
    p[0] = QueryLogic.Selectors.AnySelector()


#< correlation specification>    ::= 
#          AS   < correlation name> 
#< correlation name>    ::=   < identifier>
def p_correlationSpecification(p):
    """correlationSpecification : AS identifier"""
    p[0] = p[2]

#< joined table>    ::= 
#         < cross join>
#     | < qualified join>
def p_joinedTable_crossJoin(p):
    """joinedTable : crossJoin"""
    p[0] = p[1]

def p_joinedTable_qualifiedJoin(p):
    """joinedTable : qualifiedJoin"""
    p[0] = p[1]

#< cross join>    ::= 
#         < table reference>  CROSS  JOIN < table reference>
def p_crossJoin(p):
    """crossJoin : tableReference CROSS JOIN tableReference"""
    p[0] = QueryLogic.Joins.Implementation.CrossJoin(p[1], p[4])

#< qualified join>    ::= 
#         < table reference>  < join type> JOIN < table reference> [ < join specification> ]
#< join type>    ::= 
#		NATURAL
#	  | INNER 
#     | < outer join type> [  OUTER  ]
#< outer join type>    ::=    LEFT  |  RIGHT  |  FULL
def p_qualifiedJoin_natural(p):
    """qualifiedJoin : tableReference NATURAL JOIN tableReferenceFrom"""
    p[0] = QueryLogic.Joins.Implementation.NaturalJoin(p[1], p[4])

def p_qualifiedJoin_inner(p):
    """qualifiedJoin : tableReference INNER JOIN tableReference joinSpecification"""
    p[0] = QueryLogic.Joins.Implementation.InnerJoin(p[1], p[4], p[5])

def p_qualifiedJoin_left_outer(p):
    """qualifiedJoin : tableReference LEFT OUTER JOIN tableReference joinSpecification"""
    p[0] = QueryLogic.Joins.Implementation.LeftOuterJoin(p[1], p[5], p[6])

def p_qualifiedJoin_left(p):
    """qualifiedJoin : tableReference LEFT JOIN tableReference joinSpecification"""
    p[0] = QueryLogic.Joins.Implementation.LeftOuterJoin(p[1], p[4], p[5])

def p_qualifiedJoin_right_outer(p):
    """qualifiedJoin : tableReference RIGHT OUTER JOIN tableReference joinSpecification"""
    p[0] = QueryLogic.Joins.Implementation.RightOuterJoin(p[1], p[5], p[6])

def p_qualifiedJoin_right(p):
    """qualifiedJoin : tableReference RIGHT JOIN tableReference joinSpecification"""
    p[0] = QueryLogic.Joins.Implementation.RightOuterJoin(p[1], p[4], p[5])

def p_qualifiedJoin_full_outer(p):
    """qualifiedJoin : tableReference FULL OUTER JOIN tableReference joinSpecification"""
    p[0] = QueryLogic.Joins.Implementation.FullOuterJoin(p[1], p[5], p[6])

def p_qualifiedJoin_full(p):
    """qualifiedJoin : tableReference FULL JOIN tableReference joinSpecification"""
    p[0] = QueryLogic.Joins.Implementation.FullOuterJoin(p[1], p[4], p[5])

#< join specification>    ::=   < join condition>
def p_joinSpecification_joinCondition(p):
    """joinSpecification : joinCondition"""
    p[0] = p[1]

#< join condition>    ::=    ON < search condition>
def p_joinCondition_searchCondition(p):
    """joinCondition : ON searchCondition"""
    p[0] = p[2]

#< search condition>    ::= 
#         < boolean term>
#     | < search condition>  OR < boolean term>
def p_searchCondition_booleanTerm(p):
    """searchCondition : booleanTerm"""
    p[0] = p[1]

def p_searchCondition_searchCondition_OR_booleanTerm(p):
    """searchCondition : booleanTerm OR searchCondition"""
    p[0] = QueryLogic.Predicates.Complex.OrPredicate(p[1], p[3])

#< boolean term>    ::= 
#         < boolean factor>
#     | < boolean term>  AND < boolean factor>
def p_booleanTerm_booleanFactor(p):
    """booleanTerm : booleanFactor"""
    p[0] = p[1]

def p_booleanTerm_booleanTerm_AND_booleanFactor(p):
    """booleanTerm : booleanFactor AND booleanTerm"""
    p[0] = QueryLogic.Predicates.Complex.AndPredicate(p[1], p[3])

#< boolean factor>    ::=   [  NOT  ] < boolean test>
def p_booleanFactor_booleanTest(p):
    """booleanFactor : booleanTest"""
    p[0] = p[1]

def p_booleanFactor_NOT_booleanTest(p):
    """booleanFactor : NOT booleanTest"""
    p[0] = QueryLogic.Predicates.Complex.NotPredicate(p[2])

#< boolean test>    ::=   < boolean primary> 
def p_booleanTest_booleanPrimary(p):
    """booleanTest : booleanPrimary"""
    p[0] = p[1]

#< boolean primary>    ::=   < predicate> | < left paren> < search condition> < right paren>
def p_booleanPrimary_predicate(p):
    """booleanPrimary : predicate"""
    p[0] = p[1]

def p_booleanPrimary_LEFT_PAREN_searchCondition_rightParen(p):
    """booleanPrimary : LEFT_PAREN searchCondition RIGHT_PAREN"""
    p[0] = p[2]

#< predicate>    ::= 
#         < comparison predicate>
#     | < like predicate>
#     | < null predicate>
def p_predicate_comparisonPredicate(p):
    """predicate : comparisonPredicate"""
    p[0] = p[1]

def p_predicate_likePredicate(p):
    """predicate : likePredicate"""
    p[0] = p[1]

def p_predicate_nullPredicate(p):
    """predicate : nullPredicate"""
    p[0] = p[1]

#< comparison predicate>    ::=   < row value constructor> < comp op> < row value constructor>
def p_comparisonPredicate(p):
    """comparisonPredicate : rowValueConstructor compOp rowValueConstructor"""
    left = p[1]
    right = p[3]
    operator = p[2]
    if operator == '=':
        predicate = QueryLogic.Predicates.Simple.EqualPredicate(left, right)
    elif operator == '< >':
        predicate = QueryLogic.Predicates.Simple.NotEqualPredicate(left, right)
    elif operator == '>':
        predicate = QueryLogic.Predicates.Simple.GreaterThanPredicate(left, right)
    elif operator == '< ':
        predicate = QueryLogic.Predicates.Simple.LessThanPredicate(left, right)
    elif operator == '>=':
        predicate = QueryLogic.Predicates.Simple.GreaterEqualPredicate(left, right)
    elif operator == '< =':
        predicate = QueryLogic.Predicates.Simple.LessEqualPredicate(left, right)
    else:
        predicate = None
    p[0] = predicate

#< row value constructor>    ::= 
#         < row value constructor element>
def p_rowValueConstructor_rowValueConstructorElement(p):
    """rowValueConstructor : rowValueConstructorElement"""
    p[0] = p[1]

def p_rowValueConstructorElement_valueExpression(p):
    """rowValueConstructorElement : valueExpression"""
    if isinstance(p[1], Model.ColumnHeader):
        p[0] = QueryLogic.Expressions.RowExpressions.GetCellRowExpression(p[1])
    else:
        p[0] = QueryLogic.Expressions.RowExpressions.ConstantRowExpression(p[1])

#< like predicate>    ::=   < match value> [  NOT  ]  LIKE < pattern> [  ESCAPE < escape character> ]
def p_likePredicate_matchValue_LIKE_pattern(p):
    """likePredicate : matchValue LIKE pattern"""
    p[0] = createLike(p[1], p[3], '\0')

def p_likePredicate_matchValue_NOT_LIKE_pattern(p):
    """likePredicate : matchValue NOT LIKE pattern"""
    p[0] = QueryLogic.Predicates.Complex.NotPredicate(createLike(p[1], p[4], '\0'))

def p_likePredicate_matchValue_LIKE_pattern_ESCAPE_escapeCharacter(p):
    """likePredicate : matchValue LIKE pattern ESCAPE escapeCharacter"""
    p[0] = createLike(p[1], p[3], p[5])

def p_likePredicate_matchValue_NOT_LIKE_pattern_ESCAPE_escapeCharacter(p):
    """likePredicate : matchValue NOT LIKE pattern ESCAPE escapeCharacter"""
    p[0] = QueryLogic.Predicates.Complex.NotPredicate(createLike(p[1], p[4], p[6]))

def createLike(value, pattern, escapeCharacter):
    return QueryLogic.Predicates.Simple.LikePredicate(QueryLogic.Expressions.RowExpressions.GetCellRowExpression(value),
                                                      QueryLogic.Expressions.RowExpressions.ConstantRowExpression(pattern),
                                                      escapeCharacter)

#< match value>    ::=   < character value expression>
def p_matchValue(p):
    """matchValue : characterValueExpression"""
    p[0] = p[1]

#< qualifier>    ::=   < table name> | < correlation name>
def p_qualifier_tableName(p):
    """qualifier : identifier"""
    p[0] = p[1]

#< pattern>    ::=   < character value expression>
def p_pattern(p):
    """pattern : characterValueExpression"""
    p[0] = p[1]

#< escape character>    ::=   < character value expression>
def p_escapeCharacter(p):
    """escapeCharacter : characterValueExpression"""
    p[0] = p[1]

#< null predicate>    ::=    IS  [  NOT  ]  NULL
#[AF] Tutaj jak dla mnie jest bug, powinno być < row value constructor> IS [ NOT ] NULL
def p_nullPredicate_isNull(p):
    """nullPredicate : rowValueConstructor IS NULL"""
    p[0] = QueryLogic.Predicates.Simple.IsNullPredicate(p[1])

#< where clause>    ::=    WHERE < search condition>
def p_whereClause(p):
    """whereClause : WHERE searchCondition"""
    p[0] = p[2]

#< group by clause>    ::=    GROUP  BY < grouping column reference list>
def p_groupByClause(p):
    """groupByClause : GROUP BY groupingColumnReferenceList"""
    p[0] = System.Array[QueryLogic.Grouping.GroupingSet]([QueryLogic.Grouping.GroupingSet(System.Array[Model.ColumnHeader](p[3]))])

#< grouping column reference list>    ::=
#         < grouping column reference> [ { < comma> < grouping column reference> }... ]
def p_groupingColumnReferenceList_next(p):
    """groupingColumnReferenceList : groupingColumnReference COMMA groupingColumnReferenceList"""
    p[0] = [p[1]] + p[3]

def p_groupingColumnReferenceList_first(p):
    """groupingColumnReferenceList : groupingColumnReference"""
    p[0] = [p[1]]
    
#< grouping column reference>    ::=   < column reference> 
def p_groupingColumnReference(p):
    """groupingColumnReference : columnReference"""
    p[0] = p[1]

#< order by clause>  ::=     ORDER BY < ordering column reference list> [ SKIP < unsigned integer> ] [ FETCH < unsigned integer> ]
def p_orderByClause_orderingColumnReferenceList(p):
    """orderByClause : ORDER BY orderingColumnReferenceList"""
    p[0] = createOrderBy(p[3], None, None)

def p_orderByClause_orderingColumnReferenceList_skip(p):
    """orderByClause : ORDER BY orderingColumnReferenceList SKIP unsignedInteger"""
    p[0] = createOrderBy(p[3], p[5], None)

def p_orderByClause_orderingColumnReferenceList_fetch(p):
    """orderByClause : ORDER BY orderingColumnReferenceList FETCH unsignedInteger"""
    p[0] = createOrderBy(p[3], None, p[5])

def p_orderByClause_orderingColumnReferenceList_skip_fetch(p):
    """orderByClause : ORDER BY orderingColumnReferenceList SKIP unsignedInteger FETCH unsignedInteger"""
    p[0] = createOrderBy(p[3], p[5], p[7])

def createOrderBy(columns, skip, fetch):
    asc = QueryLogic.Ordering.OrderDirection.Ascending
    desc = QueryLogic.Ordering.OrderDirection.Descending
    skip = castToNullableInt(skip)
    fetch = castToNullableInt(fetch)
    columns = System.Array[QueryLogic.Ordering.ColumnOrdering](
        map(lambda column: QueryLogic.Ordering.ColumnOrdering(column[0], asc if column[1] == 'ASC' else desc),
            columns))
    return QueryLogic.Ordering.OrderBy(columns, skip, fetch)

def castToNullableInt(value):
    return None if value is None else System.Nullable[int](int(value))

#< ordering column reference list>    ::=
#         < ordering column reference> [ { < comma> < ordering column reference> }... ]
def p_orderingColumnReferenceList_next(p):
    """orderingColumnReferenceList : orderingColumnReference COMMA orderingColumnReferenceList"""
    p[0] = [p[1]] + p[3]

def p_orderingColumnReferenceList_first(p):
    """orderingColumnReferenceList : orderingColumnReference"""
    p[0] = [p[1]]
    
#< ordering column reference>    ::=   < column reference> [ DESC ]
def p_orderingColumnReference(p):
    """orderingColumnReference : columnReference"""
    p[0] = (p[1], 'ASC')

def p_orderingColumnReference_DESC(p):
    """orderingColumnReference : columnReference DESC"""
    p[0] = (p[1], 'DESC')

#< comp op>    ::= 
#         < equals operator>
#     | < not equals operator>
#     | < less than operator>
#     | < greater than operator>
#     | < less than or equals operator>
#     | < greater than or equals operator>
def p_compOp_EQUALS_OPERATOR(p):
    """compOp : EQUALS_OPERATOR"""
    p[0] = p[1]

def p_compOp_NOT_EQUALS_OPERATOR(p):
    """compOp : NOT_EQUALS_OPERATOR"""
    p[0] = p[1]

def p_compOp_LESS_THAN_OPERATOR(p):
    """compOp : LESS_THAN_OPERATOR"""
    p[0] = p[1]

def p_compOp_GREATER_THAN_OPERATOR(p):
    """compOp : GREATER_THAN_OPERATOR"""
    p[0] = p[1]

def p_compOp_LESS_THAN_OR_EQUALS_OPERATOR(p):
    """compOp : LESS_THAN_OR_EQUALS_OPERATOR"""
    p[0] = p[1]

def p_compOp_GREATER_THAN_OR_EQUALS_OPERATOR(p):
    """compOp : GREATER_THAN_OR_EQUALS_OPERATOR"""
    p[0] = p[1]


def initialize(plyBasePath):
    global yacc
    global lex
    global sys
    global clr
    global parser
    global lexer
    global System
    global Model
    global QueryLogic

    import imp
    import sys
    import clr

    lex = imp.load_source('ply.lex', plyBasePath + '\\lex.py')
    yacc = imp.load_source('ply.yacc',  plyBasePath + '\\yacc.py')
    lexer = lex.lex(module = sys.modules[__name__], debug=1)
    parser = yacc.yacc(module = sys.modules[__name__])

    clr.AddReference("Model")
    clr.AddReference("QueryLogic")
    clr.AddReference("System")
    import System
    import Model
    import QueryLogic


def parse(text):
    if lex == 0 or yacc == 0:
        raise RuntimeError("Not initialized")

    global errors
    errors = []

    parsedObject = parser.parse(text, lexer=lexer)
    return System.Tuple.Create[QueryLogic.Selecting.Select, System.Array[System.String]](parsedObject, System.Array[System.String](errors))

This is simply an implementation of grammar parsing with some amendments for handling muli-part identifiers and with non-standard construct SKIP FETCH instead of OFFSET FETCH. If you understand PLY then this code should be pretty straightforward. In the next part we will implement the tests.

]]>
https://blog.adamfurmanek.pl/2017/10/14/sqlxd-part-22/feed/ 0
Yet Another Useless Language Part 2 — Grammar https://blog.adamfurmanek.pl/2016/07/23/yet-another-useless-language-part-2/ https://blog.adamfurmanek.pl/2016/07/23/yet-another-useless-language-part-2/#comments Sat, 23 Jul 2016 08:00:11 +0000 https://blog.adamfurmanek.pl/?p=1765 Continue reading Yet Another Useless Language Part 2 — Grammar]]>

This is the second part of the YAUL series. For your convenience you can find other parts in the table of contents in Part 1 — Introduction

Last time we described features of a language we are going to write. Today we are going to define its grammar using EBNF-like notation.

Notation

We will describe the notation using the following syntax:

  • Elements can be written in any case
  • Optional elements are written in square brackets: [Optional]
  • One or more elements are written in curly brackets: {one_or_more}
  • Literals are written in quotation marks: 'literal'
  • Literals are case insensitive
  • Question marks indicates parts described in natural language

Identifiers

We start with defining identifiers. We will handle only latin letters, digits, and underscores. Variable’s name will need to start with letter or underscore:

IDENT = letter_or_underscore , [ { letter_or_underscore_or_digit } ] ;
letter_or_underscore = letter | underscore ;
letter_or_underscore_or_digit = letter_or_underscore | digit;
underscore = '_' ;
digit = '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' ;
letter = 'a' | 'b' | 'c' | 'd' | 'e' | 'f' | 'g' | 'h' | 'I' | 'j' | 'k' | 'l' | 'm' | 'n' | 'o' | 'p' | 'q' | 'r' | 's' | 't' | 'u' | 'v' | 'w' | 'x' | 'y' | 'z' | 'A' | 'B' | 'C' | 'D' | 'E' | 'F' | 'G' | 'H' | 'I' | 'J' | 'K' | 'L' | ' M' | 'N' | 'O' | 'P' |' Q' | 'R '| 'S' | 'T' | 'U' | 'V' | 'W' | 'X' | 'Y' | 'Z' ;

We can see that IDENT is a letter_or_underscore, optionally followed by letters, digits, or underscores. We also specified all possible letters and digits we can handle.

Trivial values

Numbers are just digits, we don’t handle fractions:

NUMBER = {digit} ;

Strings are just any printable characters delimited by double quotation marks:

STRING = '"' ; whatever , '"' ;
whatever = ? any_printable_character ? ;

Variables are just identifiers:

variable = IDENT ;

Literals are strings or numbers:

literal = STRING | NUMBER ;

We can modify variable’s value:

setvar = IDENT , '=' , expr , ';' ;

We can allocate new array:

emptyarray = 'NEW' , '[' , NUMBER , ']' ;

We can also allocate array using expressions:

arraydef = '[' list_expr ']'
list_expr = expr , [{ ',' , expr }] ;

We can get or set array’s element using:

arrayelem = IDENT , '[' , expr , ']' ;
setarrayelem = IDENT , '[' , expr , ']' , '=' , expr , ';' ;

Expression can be:

expr = expr , '+' , expr | expr , '-' , expr | expr , '*' , expr | expr , '/' , expr | expr , '%' , expr | proc_call | literal | variable | arrayelem | arraydef | emptyarray ;

We have if:

if_else = 'IF' , '(' , cond_expr , ')' , block_or_statement [ , 'ELSE' , block_or_statement ] ;
cond_expr = expr , '==' , expr | expr , '>=' , expr | expr , '<=' , expr | expr , '>' , expr | expr , '<' , expr | expr , '!=' , expr | '!' , expr ;
block_or_statement = block | statement ;

We have loop:

while = 'WHILE' , '(' , cond_expr , ')' , block_or_statement ;
break = 'BREAK' , ';' ;
continue = 'CONTINUE' , ';' ;

We can print value:

print = 'PRINT' , expr , ';' ;

We can define label — identifier ending with exclamation mark. We can also jump to it:

LABEL = letter_or_underscore , [ { letter_or_underscore_or_digit } , ] '!' ;
jump = 'JUMP' , LABEL , ';' ;

We can declare a function:

function_decl = 'function' , IDENT , '(' , [ list_param , ] ')' , block ;
list_param = [ list_param ',' , ] IDENT ;

We can define its body:

block = '{' , [ list_statement , ] '}' ;
list_statement = { statement } ;
statement = setvar | setarrayelem | if_else | while | proc_call | return | break | continue | print | jump | label ;

We can return value from a function:

return = 'RETURN' , [ expr , ] ';' ;

We can call functions:

proc_call = IDENT , '(' , [ list_expr , ] ')' , ';' ;

Finally, our program is a list of functions or statements:

start = program
program = list_function_statement ;
list_function_statement = [list_function_statement , ] function_decl | [list_function_statement , ] statement };

Summary

OK, we have our grammar. For now it is only for reference, since we will parse it in one of the last parts of this series. Next time we will write some code to represent values in memory and perform basic operations.

]]>
https://blog.adamfurmanek.pl/2016/07/23/yet-another-useless-language-part-2/feed/ 1
Yet Another Useless Language Part 1 — Introduction https://blog.adamfurmanek.pl/2016/07/16/yet-another-useless-language-part-1/ https://blog.adamfurmanek.pl/2016/07/16/yet-another-useless-language-part-1/#comments Sat, 16 Jul 2016 08:00:14 +0000 https://blog.adamfurmanek.pl/?p=1761 Continue reading Yet Another Useless Language Part 1 — Introduction]]>

This is the first part of the Yet Another Useless Language series. For your convenience you can find other parts using the links below :
Part 1 — Introduction
Part 2 — Grammar
Part 3 — Variables
Part 4 — Compiler
Part 5 — Variables
Part 6 — If
Part 7 — Loop
Part 8 — Function
Part 9 — Standard library
Part 10 — Parser
Part 11 — Storing binary on the disk

Today we start implementing simple language compiler using C# and PLY library. Before we dig into code let’s write down language features we would like to implement.

YAUL

We are going to implement scripting language using .NET Expressions for compiling and executing the code. Basically, we will define custom grammar, parse script’s source code using PLY, and create expression tree representing the script. Next, we will compile the expression tree and execute it. Thanks to that we will be able to easily compile the script code to executable binary written in managed code (so it will require .NET Framework to run).

We will implement the following features:

  • local variables declared ad-hoc
  • arrays
  • strings
  • while loop
  • function and recursive functions
  • basic standard library

Our syntax will resemble Python’s, e.g., hello world will look like this:

print "Hello, world";

We will handle integers, arrays, and strings:

number = 5;
text = "Hello!";
array = [1, "two", 3];
big_array = new [50];

We will handle common operations on numbers and strings. We will be able to extract characters from strings using array access operator.

We will handle if with else and while loop with break, continue. We will also handle goto to label.

We will be able to define recursive functions with our without return type. We will also have simple standard library with few functions.

All variables will be treated as local variables (no globals), also accessing the variable before its definition will be marked as an error. Flow instructions like if or while will not change the variables’ scoping — they will not define new scope. All functions will accept variables by value, there will be no passing by reference. Also, function’s parameters will be handled as an ordinary local variables.

Some basic programs written in YAUL:

Fibonacci:

first = 1;
second = 1;
while(first < 1000){
	print first;
	temp = second;
	second = first + second;
	first = temp;
}

Finding prime numbers:

function isPrime(number){
	temp = 2;
	while(temp < number){
		if(number % temp == 0){
			return 0;
		}
		temp = temp + 1;
	}

	return 1;
}

function checkPrimes(low, up){
	temp = low;
	while(temp <= up){
		if(isPrime(temp) == 1){
			print temp;
		}
		temp = temp + 1;
	}
}

checkPrimes(2,100);

Printing pyramid:

function printPyramid(length){
	if(length > 1){
		printPyramid(length - 1);
	}
	temp = 0;
	line = "";
	while(temp < length){
		line = line + "*";
		temp = temp + 1;
	}
	print line;
}

printPyramid(8);

In the next part we are going to write down the grammar for YAUL.

]]>
https://blog.adamfurmanek.pl/2016/07/16/yet-another-useless-language-part-1/feed/ 10
Using PLY in C# https://blog.adamfurmanek.pl/2016/06/18/using-ply-in-c/ https://blog.adamfurmanek.pl/2016/06/18/using-ply-in-c/#respond Sat, 18 Jun 2016 08:00:38 +0000 https://blog.adamfurmanek.pl/?p=1727 Continue reading Using PLY in C#]]> In this post I describe how to use Python Lex-Yacc (PLY) in C# with IronPython.

Introduction

IronPython is a .NET implementation of Python interpreter. It is based on Python 2.7.x and can be executed as separate application or run directly from .NET applications.

PLY is a Python implementation of Lex and Yacc. These tools are used to write compilers — they are capable of parsing file using LALR grammar and produce AST.

In this post I describe how to execute PLY from C# using IronPython. I have verified all steps in Visual Studio 2012-2015.

Let’s go

First, you need to create a project and install IronPython using Nuget. You can do it from package manager command line or by using Visual Studio package manager GUI. After installing IronPython you should have the following references in your project:

  • IronPython
  • IronPython.Modules
  • Microsoft.Dynamic
  • Microsoft.Scripting
  • Microsoft.Scripting.Metadata

Next, you need to download PLY files. Create a directory Lib\ply in your project directory and put there all PLY files (__init__.py, cpp.py, ctokens.py, lex.py, and yacc.py).

Since PLY uses some popular Python packages, you probably need to extract some files from ordinary Python distribution. In order to do that, just download Pyton 2.7.x, zip its Lib directory to python27.zip, and put the file in Lib directory in your project. You can remove redundant files from the archive and leave only ones required by PLY, however, it is not necessary for now.

Make sure, that all Lib files (python27.zip and PLY scripts) are copied to output directory — select them in Solution Explorer, hit F4, and set the flag appropriately. Also, change build action for python27.zip to Embedded Resource.

Now it is time to write some code. First, create a script Parsing\parser.py with method parse accepting text to parse:

def parse(text):
    if lex == 0 or yacc == 0:
        raise RuntimeError("Not initialized")

    global errors
    errors = []

    parsedObject = parser.parse(text, lexer=lexer)
    return parsedObject

This is just a stub of a parsing method which will invoke parsing using PLY. I leave all the rest for you, since the rest is just a PLY code (grammar definitions, code for building AST etc).

You also need another one function in your script, which will be used to load all PLY scripts:

def initialize(plyBasePath):
    global yacc
    global lex
    global sys
    global clr
    global parser
    global lexer
    global System

    import imp
    import sys
    import clr

    lex = imp.load_source('ply.lex', plyBasePath + '\\lex.py')
    yacc = imp.load_source('ply.yacc',  plyBasePath + '\\yacc.py')
    lexer = lex.lex(module = sys.modules[__name__], debug=1)
    parser = yacc.yacc(module = sys.modules[__name__])

    clr.AddReference("System")
    import System

This function defines variables for PLY, loads some .NET libraries, end prepares parser and lexer. Notice that it is running PLY in debug mode but it is only for tutorial purposes.

Make sure that parser.py is copied to output directory.

This is it when it comes to Python part (not including grammar and rest of PLY actual code). Now let’s move to C# part.

Let’s define a class for parser:

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Reflection;
using IronPython.Hosting;
using IronPython.Modules;
using Microsoft.Scripting.Hosting;

namespace Parsing
{
    public class Parser
    {
        private static dynamic _ipy;

        public Parser()
        {
            _ipy = _ipy ?? CreateEngine();
        }

        private dynamic CreateEngine()
        {
            ScriptRuntimeSetup setup = Python.CreateRuntimeSetup(GetRuntimeOptions());
            var pyRuntime = new ScriptRuntime(setup);
            ScriptEngine engineInstance = Python.GetEngine(pyRuntime);

            AddPythonLibrariesToSysMetaPath(engineInstance);

            dynamic ipy = pyRuntime.UseFile(@"Parsing\parser.py");
            ipy.initialize(GetPlyPath());

            return ipy;
        }

        private static Dictionary < string, object> GetRuntimeOptions()
        {
            var options = new Dictionary < string, object>();
            options["Debug"] = false;
            return options;
        }

        private static string GetPlyPath()
        {
            return Path.Combine(Environment.CurrentDirectory, "Lib", "ply");
        }
    }
}

We have simple Parser class which creates an IronPython runtime. You can notice that I am passing Debug flag to runtime so it will print out all debugging informations when something wrong happens.

As we can see, first I create a runtime with runtime options. Next, I add Python libraries to Python paths so our scripts will be able to use them. Next, I load a file parser.py and call its initialize method (the one which I described above).

Below is a function which adds Python libraries to path:

private void AddPythonLibrariesToSysMetaPath(ScriptEngine engineInstance)
{
	Assembly asm = GetType().Assembly;
	IEnumerable < string> resQuery =
		from name in asm.GetManifestResourceNames()
		where name.ToLowerInvariant().EndsWith("python27.zip")
		select name;
	string resName = resQuery.Single();
	var importer = new ResourceMetaPathImporter(asm, resName);
	dynamic sys = engineInstance.GetSysModule();
	sys.meta_path.append(importer);
	sys.path.append(importer);
}

First, we get current executing assembly. Next, we access Python libraries stored in a zip file which is in resources. Next, we use importer to extract them in memory and add to path.

Finally, here comes the crucial method for parsing code:

public object Parse(string content)
{
	return _ipy.parse(content);
}

We simply pass a file content to IronPython runtime and call parse method from our script.

]]>
https://blog.adamfurmanek.pl/2016/06/18/using-ply-in-c/feed/ 0