This is the fifth part of the YAUL series. For your convenience you can find other parts in the table of contents in Part 1 — Introduction
Hi! Today we are going to add variables support to our language.
Table of Contents
Introduction
In previous parts we saw implementation of variable type for storing values and performing operations. We also saw the grammar for YAUL which allows us to declare variables, create arrays, and perform operations on values. We do not need to declare variables explicitly, however, we need to assign them any value before they are used in subsequent operations.
Before we dig into C# code, let’s see the Python code for parsing operations.
PLY
First, we start with defining lexems.
Tokens
Before we do anything in our language, let’s define symbols, operator precedence, and compound operators:
1 2 3 4 5 6 7 |
literals = [':', ';', ',', '(', ')', '[', ']', '{', '}', '=', '*', '+', '-', '/', '%', '>', '<', '!'] precedence = ( ("left", '+', '-'), ("left", '*', '/', '%') ) |
First, we define all allowed characters in our source code. Any other character will be ignored silently. Next, we define operators’ precedence. We will use it when it comes to operations on variables.
Next, let’s define keywords and operators:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
keywords = { 'print' : 'PRINT', 'function' : 'FUNCTION', 'if' : 'IF', 'else' : 'ELSE', 'while' : 'WHILE', 'return' : 'RETURN', 'new' : 'NEW', 'continue' : 'CONTINUE', 'break' : 'BREAK', 'jump' : 'JUMP', } tokens = ['EQEQ', 'GTEQ', 'LSEQ', 'NUMBER', 'STRING', 'IDENT', 'LABEL' ] + list(keywords.values()) t_EQEQ = r"==" t_GTEQ = r">=" t_LSEQ = r"<=" |
First, we define a list of keywords. You can see that we have keyword for all common constructions, we also have custom keyword for printing variables. We also define how to match operators consisting of two characters.
Let’s also ignore comments in the source code:
1 2 3 |
t_ignore_LINE_COMMENT = r'//.*' t_ignore_BLOCK_COMMENT = r'/\*((.|\n)*?)\*/' t_ignore = ' \t' |
We should also handle line numbers and errors in order to present better error descriptions:
1 2 3 4 5 6 7 8 9 10 |
def t_newline(t): r'\n+' t.lexer.lineno += len(t.value) errors = [] def t_error(t): global errors errors.append("Line {0:3}:\tIllegal character '{1}'".format(t.lexer.lineno, t.value[0])) t.lexer.skip(1) |
Literals
We have only two primitives in our language: numbers (integers) and strings. So let’s handle them with PLY:
1 2 3 4 5 6 7 8 9 |
def t_STRING(t): r'".*?"' t.value = t.value.strip(r'"'); return t def t_NUMBER(t): r'\d+' t.value = int(t.value) return t |
String is basically anything delimited with double quotation marks. Notice that we have non-greedy match in order to not catch too many characters. Numbers are just bunch of digits, nothing more. We do not handle real numbers.
OK, we can handle literals.
Variables
Let’s now handle identifiers for variables:
1 2 3 4 |
def t_IDENT(t): r'[a-zA-Z_][a-zA-Z0-9_]*' t.type = keywords.get(t.value,'IDENT') return t |
Identifier is anything starting with letter or underscore, followed by any number of letters, digits, and underscores. We parse the lexem and store its type.
OK, we are able to match literals and identifiers. Let’s write code to store value in the variable:
1 2 3 4 5 |
def p_setvar_assign_variable(p): """setvar : IDENT '=' expr ';'""" p[0] = Compiler.Assignment() p[0].VariableName = p[1] p[0].ValueExpression = p[3] |
Here we match expressions in form variable = literal
. We can match any expression as right side of assignment, and store data in Assignment
object to utilize it later. Expression is allowed to have the following form:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
def p_expr_arithmetic(p): """expr : expr '+' expr | expr '-' expr | expr '*' expr | expr '/' expr | expr '%' expr""" p[0] = Compiler.BinaryOperation() p[0].Left = p[1] p[0].Right = p[3] p[0].Sign = p[2] def p_expr_others(p): """expr : funct_call | literal | variable | arrayelem | arraydef | emptyarray""" p[0] = p[1] |
First, we handle operators like addition or multiplication. We parse them and store as BinaryOperation
which we will examine in next parts of this series. We can also handle different things as expressions: function calls, literals, other variables, array elements, array definitions, and empty arrays. We will cover most of them in next parts. For now let’s focus on few of them.
First, literals. We assume that literal is string or number:
1 2 3 4 5 |
def p_literal_string(p): """literal : STRING | NUMBER """ p[0] = Compiler.ConstantExpression() p[0].Value = p[1] |
Since we will handle types in C# code, we do not need to handle them explicitly, we just store the value.
Next, let’s handle assigning one variable to another:
1 2 3 4 |
def p_variable(p): """variable : IDENT """ p[0] = Compiler.VariableDereference() p[0].Name = p[1] |
We treat variable
as VariableDereference
. In the C# counterpart we will need to check whether the variable is already defined. If it is so — we can perform assignment.
Arrays
When it comes to arrays, we can create empty array with specified size:
1 2 3 4 |
def p_emptyarray(p): """ emptyarray : NEW '[' NUMBER ']' """ p[0] = Compiler.ConstantArray() p[0].Value = Array[Compiler.SimpleObject]([Compiler.SimpleObject(0) for n in range(p[3])]) |
We create new array and initialize its elements to zeros. We can also define array as list of expressions:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
def p_arraydef(p): """ arraydef : '[' list_expr ']' """ p[0] = Compiler.ConstantArray() p[0].Value = Array[Compiler.SimpleObject]([Compiler.SimpleObject(n) for n in p[2]]) def p_list_expr_first(p): """list_expr : expr""" p[0] = [p[1]] def p_list_expr_next(p): """list_expr : list_expr ',' expr""" p[0] = p[1] p[0].append(p[3]) |
We match two brackets and extract all values separated with commas. We can also extract element from array:
1 2 3 4 5 |
def p_arrayelem(p): """ arrayelem : IDENT '[' expr ']' """ p[0] = Compiler.ArrayAccess() p[0].Name = p[1] p[0].Index = p[3] |
And we can assign value to array’s element:
1 2 3 4 5 6 |
def p_setarrayelem_variable(p): """setarrayelem : IDENT '[' expr ']' '=' expr ';'""" p[0] = Compiler.ArrayAssignment() p[0].VariableName = p[1] p[0].ValueExpression = p[6] p[0].Index = p[3] |
That’s it when it comes to PLY code. Most of it is rather straightforward, we simply extract tokens from source code and store them in custom classes in order to handle them later. Nothing fancy here.
C#
Now it is time to write C# code to handle values.
Basic variables assignment
Let’s start with assigning variables:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
using System.Linq; using System.Linq.Expressions; namespace Compiler { public class Assignment : IStatement { public string VariableName { get; set; } public IExpression ValueExpression { get; set; } public Expression Accept(IVisitor visitor) { Variable dereferencedVariable = visitor.LocalVariables.SingleOrDefault(variable => variable.Name.Equals(VariableName)); if (dereferencedVariable == null) { dereferencedVariable = new Variable { Name = VariableName, VariableReference = Expression.Variable(typeof (SimpleObject), VariableName) }; visitor.LocalVariables.Add(dereferencedVariable); } return Expression.Assign(dereferencedVariable.VariableReference, ValueExpression.Accept(visitor)); } public void FindLabels(IVisitor visitor) { } } } |
As we saw in previous part, we store all variables in visitor object. First, we try to find variable by name — if there is no such a variable, we simply create it in lines 16-19. Please also notice Expression.Variable
in line 18 — this is the code which creates actual variable using lambdas. This lambda will be translated into ordinary variable definition of type SimpleObject
with specified name, so this will be something like: SimpleObject name;
.
Next, in line 24 we assign value to variable. We pass our visitor to ValueExpression
, so the value will be calculated in runtime (if it is needed).
Assignment defines no labels, so FindLables
simply does nothing. Please remember, that we use labels to perform gotos, we will handle them in next parts.
To sump up: this C# code creates new variable if it needed and assigns value to it. Variable is created in two places: one place is our local list of variables, so we can track them and handle them correctly, second place is lambda. The latter place is the place where all magic happens and the variable is actually created. Assignment is handled in only one place and is performed in runtime — whether it is translated to assigning constant or function call is up to ValueExpression.Accept
result.
Please also notice, that we do not have scopes. All variables are stored in one collection, we do not verify whether these variables are properly scoped. We could do it here, but let’s not bother with it for now.
OK, let’s move on. We can assign constants to variables, so let’s see the C# code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
using System.Linq.Expressions; namespace Compiler { public class ConstantExpression : IExpression { public object Value { get; set; } public Expression Accept(IVisitor visitor) { System.Linq.Expressions.ConstantExpression param = Expression.Constant(Value, Value.GetType()); return Expression.Call(typeof (YaulCompiler), "ConstructSimpleObject", null, new Expression[] { param }); } public void FindLabels(IVisitor visitor) { } } } |
No magic here — since we have parsed value from PLY, we can simply create a lambda representing constant (line 11) with correct type (determined at runtime by examining the value, it is not done by the parser, however, it could be). Next, we return a lambda representing call to static function of YaulCompiler
which simply creates new object with value and returns it. We could replace this lambda will direct call to SimpleObject
constructor, however, using helper function is better, since we have exactly one place of variables creation.
Let’s now see how we can assign one variable to another. In order to do that, we need to dereference existing variable. Here is the code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
using System.Linq; using System.Linq.Expressions; namespace Compiler { public class VariableDereference : IExpression { public string Name { get; set; } public Expression Accept(IVisitor visitor) { var variable = visitor.LocalVariables.FirstOrDefault(v => v.Name.Equals(Name)); if (variable != null) { return variable.VariableReference; } throw new VariableNotInitializedException(Name); } public void FindLabels(IVisitor visitor) { } } } |
First, we traverse list of variables and try to find variable matching by name. If we found one, we return it — in other case we throw an exception describing the problem.
So now we are able to assign literal to variable and variable to variable. Let’s handle some more sophisticated scenarios.
Arrays
Let’s start with empty array:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
using System.Linq.Expressions; namespace Compiler { public class ConstantArray : IExpression { public object Value { get; set; } public Expression Accept(IVisitor visitor) { return Expression.Constant(new SimpleObject(Value)); } public void FindLabels(IVisitor visitor) { } } } |
We created an actual array of values in Python code, so now we only need to create a constant representing the object. Please notice, that Expression.Constant
means a predefined value, not a constant like const int
. We don’t care whether this array is with predefined size or created from expression list — we have all the values provided by PLY and we can create the array.
When it comes to accessing array elements, we have the following code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
using System.Linq; using System.Linq.Expressions; namespace Compiler { public class ArrayAccess : IExpression { public string Name { get; set; } public IExpression Index { get; set; } public Expression Accept(IVisitor visitor) { var argument = Index.Accept(visitor); return Expression.Call(typeof (SimpleObject), "GetElement", null, new Expression[]{ visitor.LocalVariables.Single(variable => variable.Name.Equals(Name)) .VariableReference, argument }); } public void FindLabels(IVisitor visitor) { } } } |
In line 13 we extract the index of element. Since the index might be a value calculated in runtime, we simply transform it to lambdas and call helper method from SimpleObject
(line 14). Please also notice that we don’t check whether we have variable defined or not — in the latter situation we will throw an exception. Here you can see why it is a good idea to have custom exception types — getting NullReferenceException
explains much less than custom type.
Let’s now see the code for changing array’s element:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
using System.Linq; using System.Linq.Expressions; namespace Compiler { public class ArrayAssignment : IStatement { public string VariableName { get; set; } public IExpression ValueExpression { get; set; } public IExpression Index { get; set; } public Expression Accept(IVisitor visitor) { Variable dereferencedVariable = visitor.LocalVariables.FirstOrDefault(variable => variable.Name.Equals(VariableName)); if (dereferencedVariable == null) { throw new VariableNotInitializedException(VariableName); } return Expression.Call(typeof(SimpleObject), "SetElement", null, new Expression[] { dereferencedVariable.VariableReference, Index.Accept(visitor), ValueExpression.Accept(visitor) }); } public void FindLabels(IVisitor visitor) { } } } |
First, we look for variable. If we can’t find it — we throw. Next, we call helper method SetElement
and pass all required arguments — array variable, array index, and value for element.
Arithmetic
There is one more thing which we can cover today — binary operations. In PLY we defined a function for parsing binary operations like addition, here is C# code for performing actual calculations:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
using System; using System.Linq.Expressions; namespace Compiler { public class BinaryOperation : IExpression { public IExpression Left { get; set; } public IExpression Right { get; set; } public string Sign { get; set; } public Expression Accept(IVisitor visitor) { var leftSide = Left.Accept(visitor); var rightSide = Right.Accept(visitor); switch (Sign) { case "+": return Expression.Add(leftSide, rightSide); case "-": return Expression.Subtract(leftSide, rightSide); case "*": return Expression.Multiply(leftSide, rightSide); case "/": return Expression.Divide(leftSide, rightSide); case "%": return Expression.Modulo(leftSide, rightSide); case "<": return Expression.LessThan(leftSide, rightSide); case "<=": return Expression.LessThanOrEqual(leftSide, rightSide); case ">": return Expression.GreaterThan(leftSide, rightSide); case ">=": return Expression.GreaterThanOrEqual(leftSide, rightSide); case "==": return Expression.Equal(leftSide, rightSide); case "!=": return Expression.NotEqual(leftSide, rightSide); default: throw new InvalidOperationException(string.Format("Incorrect operation sign: {0}", Sign)); } } public void FindLabels(IVisitor visitor) { } } } |
A bit more code than in other snippets, however, there is nothing fancy here. We simply try to match operator and create correct lambda for performing the operation. You can also notice, that we handle much more operators than in PLY (namely, comparison operators) — we will explain them in next parts.
Summary
We are now able to assign values to variables. In next part we are going to examine if
construct.