YAUL – Random IT Utensils https://blog.adamfurmanek.pl IT, operating systems, maths, and more. Wed, 04 Apr 2018 23:17:48 +0000 en-US hourly 1 https://wordpress.org/?v=6.6.2 Yet Another Useless Language Part 11 — Storing binary on the disk https://blog.adamfurmanek.pl/2016/09/24/yet-another-useless-language-part-11/ https://blog.adamfurmanek.pl/2016/09/24/yet-another-useless-language-part-11/#respond Sat, 24 Sep 2016 08:00:23 +0000 https://blog.adamfurmanek.pl/?p=1814 Continue reading Yet Another Useless Language Part 11 — Storing binary on the disk]]>

This is the eleventh part of the YAUL series. For your convenience you can find other parts in the table of contents in Part 1 — Introduction

We already know how to parse YAUL application and execute it. Today we are going to compile it to binary and store it on disk.

Compiling

There is just one thing we need to add to YaulCompiler:

public static void CompileToAssembly(Expression expression, string assemblyName)
{
	var asmName = new AssemblyName(assemblyName);
	var asmBuilder = AssemblyBuilder.DefineDynamicAssembly
		(asmName, AssemblyBuilderAccess.RunAndSave);
	var moduleBuilder = asmBuilder.DefineDynamicModule(assemblyName, $"{assemblyName}.exe");

	var typeBuilder = moduleBuilder.DefineType("Program", TypeAttributes.Public);
	var methodBuilder = typeBuilder.DefineMethod("Main",
		MethodAttributes.Static, typeof(void), new[] { typeof(string) });

	Expression.Lambda< Action>(expression).CompileToMethod(methodBuilder);

	typeBuilder.CreateType();
	asmBuilder.SetEntryPoint(methodBuilder);
	asmBuilder.Save($"{assemblyName}.exe");
}

Every application needs to have static function call Main. We create such function, put it in the assembly, and compile one big lambda representing whole script to this function. Next, we mark this function as entry point, and save it on the disk. Everything thanks to LINQ’s magic.

Summary

Our language is finished. We are able to implement simple applications using YAUL, we can compile them to binary and execute them natively (using .NET platform). We have standard functions library. I hope you find it useful despite the name — Yet Another Useless Language.

]]>
https://blog.adamfurmanek.pl/2016/09/24/yet-another-useless-language-part-11/feed/ 0
Yet Another Useless Language Part 10 — Parser https://blog.adamfurmanek.pl/2016/09/17/yet-another-useless-language-part-10/ https://blog.adamfurmanek.pl/2016/09/17/yet-another-useless-language-part-10/#comments Sat, 17 Sep 2016 08:00:16 +0000 https://blog.adamfurmanek.pl/?p=1811 Continue reading Yet Another Useless Language Part 10 — Parser]]>

This is the tenth part of the YAUL series. For your convenience you can find other parts in the table of contents in Part 1 — Introduction

We have our parser almost done, there are just few things left. Let’s begin.

Program structure

For now we focused on parsing separate constructions. Now it is high time to parse the whole application:

def p_start(p):
    """start : program"""
    p[0] = p[1]

def p_program_not_empty(p):
    """program  : list_function_statement"""
    p[0] = Compiler.FunctionStatementBlock()
    p[0].Statements = System.Collections.Generic.List[Compiler.IStatement] ( p[1][0] )
    p[0].Functions = System.Collections.Generic.List[Compiler.FunctionDefinition]( p[1][1] )

def p_program_empty(p):
    """program  : """
    p[0] = Compiler.FunctionStatementBlock()
    p[0].Statements = System.Collections.Generic.List[Compiler.IStatement] ( [] )
    p[0].Functions = System.Collections.Generic.List[Compiler.FunctionDefinition]( [] )

def p_list_function_statement_first_statement(p):
    """list_function_statement : statement"""
    p[0] = [p[1]], []

def p_list_function_statement_first_function(p):
    """list_function_statement : function_decl"""
    p[0] = [], [p[1]]
        
def p_list_function_statement_next_statement(p):
    """list_function_statement : list_function_statement statement"""
    p[0] = p[1]
    p[0][0].append(p[2])

def p_list_function_statement_next_function(p):
    """list_function_statement : list_function_statement function_decl"""
    p[0] = p[1]
    p[0][1].append(p[2])

Our application is simply a list of functions and statements. We go through the source code and collect all of them.

Error handling

We would like to be able to diagnose common parsing errors, like missing semicolon, wrong syntax, etc. This is called parser resynchronization. Here is the code:

# resynchronization
    
def p_block_error(p):
    """block : '{' error '}'"""
    pass

def p_block_error_multiple(p):
    """block : '{' list_statement error '}'"""
    pass

def p_statement_error(p):
     """statement : error ';'"""

def p_function_decl_with_params2(p):
    """function_decl : error FUNCTION IDENT '(' list_param ')' block """
    p[0] = Compiler.FunctionDefinition()
    p[0].Name = p[3]
    p[0].Body = p[7]
    p[0].Parameters = System.Collections.Generic.List[Compiler.Parameter] (p[5])

def p_error(p):
    global errors
    if p:
        errors.append("Line {0:3}:\tSyntax error - unexpected '{1}' ".format(p.lineno, str(p.value)))
    else:
        errors.append("Syntax error - unexpected EOF ")

    if p.type == '}':
        yacc.errok()  # skip additional }

    if p.type == 'FUNCTION':  # resynchronize on function keyword
        return p

This is just a basic error handling. In practice we would like to have better error descriptions, like what was expected in the code, but for now this is sufficient.

Missing stuff

And here is other parser’s missing stuff:

def initialize(plyBasePath):
    global yacc
    global lex
    global sys
    global clr
    global imp
    global parser
    global lexer
    global System
    global Compiler
    global ConstantExpression

    import imp
    import sys
    import clr
    import os

    clr.AddReference("System")
    clr.AddReference("System.Core")
    clr.AddReference("Compiler");
    import System
    import Compiler

    lex = imp.load_source('ply.lex', plyBasePath + '\\lex.py')
    yacc = imp.load_source('ply.yacc',  plyBasePath + '\\yacc.py')
    lexer = lex.lex(module = sys.modules[__name__], debug=1)
    parser = yacc.yacc(module = sys.modules[__name__], debug=1)


def parse(text):
    return parser.parse(text, lexer=lexer)

if __name__ == '__main__':
    
    if len(sys.argv) != 2:
        print("Not valid number of arguments")
        sys.exit(1)

    filename = sys.argv[1]
    file_content = open(filename, "r").read()

    astTree = parse(file_content)

    if errors:
        print "INTERPRETATION FAILED!"
        for error in errors:
            print error
    else:

        try:
            scriptExpression = Compiler.CreateExpressionFromAST(astTree);
            Compiler.ExecuteScript(scriptExpression);
        except Exception as e:
            print "INTERPRETATION FAILED!"
            print e

        
    
    a=raw_input()

Summary

This is it when it comes to PLY’s part. Next time we are going to finish C#’s part and our language will be complete.

]]>
https://blog.adamfurmanek.pl/2016/09/17/yet-another-useless-language-part-10/feed/ 1
Yet Another Useless Language Part 9 — Standard library https://blog.adamfurmanek.pl/2016/09/10/yet-another-useless-language-part-9/ https://blog.adamfurmanek.pl/2016/09/10/yet-another-useless-language-part-9/#comments Sat, 10 Sep 2016 08:00:04 +0000 https://blog.adamfurmanek.pl/?p=1808 Continue reading Yet Another Useless Language Part 9 — Standard library]]>

This is the ninth part of the YAUL series. For your convenience you can find other parts in the table of contents in Part 1 — Introduction

Today we are going to describe standard library for YAUL.

Introduction

Every language has it’s own set of common functions. This is probably the most important part of language — programmers do not want to implement basic stuff and they want to use existing API. This is why we should provide common functions for manipulating numbers, strings, arrays. Consider LINQ — it is very powerful part of C#, without it we would need to write boring loops over and over again. However, having LINQ we can simply compose functions invocations and quickly transform collections.

Code

Let’s see the implementation:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Linq.Expressions;

namespace Compiler
{
    public class StandardLibrary
    {
        public static IEnumerable< PreparedFunction> GetDeclarations()
        {
            var libraryFunctions = GetFunctions();
            return libraryFunctions.Select(function => function.GetPreparedFunction());
        }

        public static IEnumerable< Expression> GetDefinitions(IEnumerable< PreparedFunction> declarations)
        {
            var libraryFunctions = GetFunctions();
            return declarations.Zip(libraryFunctions, (declaration, definition) => Expression.Assign(declaration.Function, definition.Expression));
        }

        private static IEnumerable< StandardLibraryFunction> GetFunctions()
        {
            return new List< StandardLibraryFunction>
                {
                    new StdMax(),
                    new StdPrint(),
                    new StdCos(),
                    new StdMin(),
                    new StdPow()
                };
        }
    }

    public abstract class StandardLibraryFunction
    {
        public abstract Expression Expression { get; }
        public abstract String Name { get; }

        public PreparedFunction GetPreparedFunction()
        {
            return new PreparedFunction()
            {
                Function = Expression.Variable(Expression.Type, Name),
                Name = Name
            };
        }
    }

    public class StdMax : StandardLibraryFunction
    {
        public override Expression Expression
        {
            get
            {
                Expression< Func< SimpleObject, SimpleObject, SimpleObject>> expression =
                    (obj1, obj2) => obj1 > obj2 ? new SimpleObject(obj1) : new SimpleObject(obj2);

                return expression;
            }
        }

        public override string Name
        {
            get { return "_max"; }
        }
    }

    public class StdMin : StandardLibraryFunction
    {
        public override Expression Expression
        {
            get
            {
                Expression< Func< SimpleObject, SimpleObject, SimpleObject>> expression =
                    (obj1, obj2) => obj1 < obj2 ? new SimpleObject(obj1) : new SimpleObject(obj2);

                return expression;
            }
        }

        public override string Name
        {
            get { return "_min"; }
        }
    }

    public class StdPow : StandardLibraryFunction
    {
        public override Expression Expression
        {
            get
            {
                Expression< Func< SimpleObject, SimpleObject, SimpleObject>> expression =
                    (obj1, obj2) => new SimpleObject(Math.Pow((int) obj1.Value, (int) obj2.Value));

                return expression;
            }
        }

        public override string Name
        {
            get { return "_pow"; }
        }
    }

    public class StdPrint : StandardLibraryFunction
    {
        public static SimpleObject Print(SimpleObject obj)
        {
            Console.WriteLine(obj.Value);
            return new SimpleObject("null");
        }

        public override Expression Expression
        {
            get
            {
                Expression< Func< SimpleObject, SimpleObject>> expression = obj => Print(obj);
                return expression;
            }
        }

        public override string Name
        {
            get { return "_print"; }
        }
    }

    public class StdCos : StandardLibraryFunction
    {
        public override Expression Expression
        {
            get
            {
                Expression< Func< SimpleObject, SimpleObject>> expression = obj => new SimpleObject((int)Math.Cos((int)obj.Value));
                return expression;
            }
        }

        public override string Name
        {
            get { return "_cos"; }
        }
    }
}

We represent every library function as simple expression, so we can implement them using lambdas. As you can see, it is pretty straightforward. Please also notice that we have a function for printing. We can handle it directly in our parser in this way:

def p_print(p):
    """ print : PRINT expr ';' """
    call = Compiler.FunctionCall()
    call.Name = "_print"
    call.Arguments = System.Collections.Generic.List[Compiler.IExpression] ([p[2]])

    p[0] = Compiler.ProdecureCall()
    p[0].FunctionCall = call

It is just a direct call to function provided in a standard library.

Summary

Well, our language is almost complete. Next time we will finish our parser and try to handle parsing errors. In the last part we will see how to compile YAUL programs to binaries and store them on the hard drive.

]]>
https://blog.adamfurmanek.pl/2016/09/10/yet-another-useless-language-part-9/feed/ 1
Yet Another Useless Language Part 8 — Function https://blog.adamfurmanek.pl/2016/09/03/yet-another-useless-language-part-8/ https://blog.adamfurmanek.pl/2016/09/03/yet-another-useless-language-part-8/#comments Sat, 03 Sep 2016 08:00:27 +0000 https://blog.adamfurmanek.pl/?p=1803 Continue reading Yet Another Useless Language Part 8 — Function]]>

This is the eighth part of the YAUL series. For your convenience you can find other parts in the table of contents in Part 1 — Introduction

Today we are going to handle functions in YAUL. Let’s go.

Introduction

We are going to represent functions as blocks of code. Since we want to handle local variables, we will need to add them to scope and remove after creating the function. We also need to handle names duplicates and conflicts with functions defined in standard library. The code will be little longer today.

Parser

Let’s begin with PLY:

def p_function_decl_with_params(p):
    """function_decl : FUNCTION IDENT '(' list_param ')' block """
    p[0] = Compiler.FunctionDefinition()
    p[0].Name = p[2]
    p[0].Body = p[6]
    p[0].Parameters = System.Collections.Generic.List[Compiler.Parameter] (p[4])

def p_function_decl_with_params2(p):
    """function_decl : error FUNCTION IDENT '(' list_param ')' block """
    p[0] = Compiler.FunctionDefinition()
    p[0].Name = p[3]
    p[0].Body = p[7]
    p[0].Parameters = System.Collections.Generic.List[Compiler.Parameter] (p[5])
Comp
def p_function_decl_without_params(p):
    """function_decl : FUNCTION IDENT '(' ')' block """
    p[0] = Compiler.FunctionDefinition()
    p[0].Name = p[2]
    p[0].Body = p[5]
    p[0].Parameters = System.Collections.Generic.List[Compiler.Parameter] ( [] )

It looks very similar to code for loops. One additional thing is list of parameters:

def p_list_param_first(p):
    """list_param : IDENT """
    param = Compiler.Parameter()
    param.Name = p[1]
    p[0] = [param]    

def p_list_param_next(p):
    """list_param : list_param ',' IDENT """
    p[0] = p[1]
    param = Compiler.Parameter()
    param.Name = p[3]
    p[0].append(param)

However, there is a very special keyword for functions: return. We can return value or just exit the function:

def p_return_void(p):
    """return : RETURN ';' """
    p[0] = Compiler.Compiler.Return()

def p_return_value(p):
    """return : RETURN expr ';' """
    p[0] = Compiler.Return()
    p[0].ReturnedValue = p[2]

What’s more, we would like to be able to call our functions:

def p_proc_call(p):
    """proc_call : funct_call ';' """
    p[0] = Compiler.ProdecureCall()
    p[0].FunctionCall = p[1]

def p_funct_call_normal_none_params(p):
    """funct_call : IDENT '(' ')'"""
    p[0] = Compiler.FunctionCall()
    p[0].Name = p[1]
    p[0].Arguments = System.Collections.Generic.List[Compiler.IExpression]()

def p_funct_call_normal_with_params(p):
    """funct_call : IDENT '(' list_expr ')'"""
    p[0] = Compiler.FunctionCall()
    p[0].Name = p[1]
    p[0].Arguments = System.Collections.Generic.List[Compiler.IExpression](p[3])

This is it, nothing tricky here.

C#

We start with function definition:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Linq.Expressions;

namespace Compiler
{
    public class FunctionDefinition : ISemanticObject
    {
        public StatementBlock Body { get; set; }
        public List< Parameter> Parameters { get; set; }

        public string Name { get; set; }

        public Expression Accept(IVisitor visitor)
        {
            var parameterExpressions = CreateParamsExpression(visitor);
            AddParamsToScope(visitor, parameterExpressions);
            
            DefineReturnLabel(visitor);

            var bodyExpression = Body.Accept(visitor);

            RemoveParamsFromScope(visitor, parameterExpressions);

            var usedVariables = GetUsedVariables(visitor);

            var newBodyExpression = DecorateWithReturnLabel(visitor, bodyExpression, usedVariables);

            var functionExpression = CreateFunctionExpression(visitor, newBodyExpression, parameterExpressions);

            visitor.LocalVariables.Clear();
            
            return functionExpression;
        }

        private BinaryExpression CreateFunctionExpression(IVisitor visitor, BlockExpression newBodyExpression,
                                                          IEnumerable< ParameterExpression> parameterExpressions)
        {
            var methodVar = (ParameterExpression) visitor.Functions.Single(func => func.Name.Equals(Name)).Function;
            var functionExpression = Expression.Assign(methodVar,
                                                       Expression.Lambda(newBodyExpression, Name, parameterExpressions));
            return functionExpression;
        }

        private static BlockExpression DecorateWithReturnLabel(IVisitor visitor, Expression bodyExpression, IEnumerable< ParameterExpression> usedVariables)
        {
            Expression defaultValue = Expression.Call(typeof (YaulCompiler), "ConstructSimpleObject", null,
                                                      new Expression[] {Expression.Constant("", typeof (string))});
            var labelExpression = Expression.Label(visitor.ReturnLabel, defaultValue);

            var bodyAndLabel = new List< Expression> {bodyExpression, labelExpression};            
            return Expression.Block(usedVariables, bodyAndLabel);
        }

        private static IEnumerable< ParameterExpression> GetUsedVariables(IVisitor visitor)
        {
            return visitor.LocalVariables.Select(variable => variable.VariableReference);
        }

        private static void RemoveParamsFromScope(IVisitor visitor, List< ParameterExpression> parameterExpressions)
        {
            for(int i = visitor.LocalVariables.Count - 1; i >= 0; --i)
            {
                if (parameterExpressions.Contains(visitor.LocalVariables[i].VariableReference))
                {
                    visitor.LocalVariables.RemoveAt(i);
                }
            }
        }

        private static void DefineReturnLabel(IVisitor visitor)
        {
            visitor.ReturnLabel = Expression.Label(typeof(SimpleObject));
        }

        private void AddParamsToScope(IVisitor visitor, IEnumerable< ParameterExpression> parameterExpressions)
        {
            var variables = parameterExpressions.Select(CreateVariableFromParam);
            foreach(var variable in variables)
            {
                visitor.LocalVariables.Add(variable);
            }
            
        }

        private Variable CreateVariableFromParam(ParameterExpression parameterExpression)
        {
            return new Variable
                {
                    Name = parameterExpression.Name,
                    VariableReference = parameterExpression
                };
        }

        private List< ParameterExpression> CreateParamsExpression(IVisitor visitor)
        {
            return Parameters.Select(parameter => (ParameterExpression) parameter.Accept(visitor)).ToList();
        }

        public void FindLabels(IVisitor visitor)
        {
            
        }

        public PreparedFunction Prepare()
        {
            var type = DetermineType();
            var variable = Expression.Variable(type, Name);

            return new PreparedFunction
                {
                    Function = variable,
                    Name = Name
                };

        }

        private Type DetermineType()
        {
            var parameterExpressions = Parameters.Select(parameter => (ParameterExpression) parameter.Accept(null)).ToList();
            
            LabelTarget returnLabel = Expression.Label(typeof (SimpleObject));
            Expression defaultValue = Expression.Call(typeof (YaulCompiler), "ConstructSimpleObject", null,
                                                      new Expression[] {Expression.Constant("", typeof (string))});

            var simpleBody = Expression.Label(returnLabel, defaultValue);

            var declaration = Expression.Lambda(simpleBody, Name, parameterExpressions);

            return declaration.Type;
        }
    }
}

Let’s examin Accept function. First, we create expressions for parameters. Next, we add them to local scope so they can be used as local variables. Next, we visit body. Since we have local variables in scope, body is free to use them. When it is done, we remove parameters from scope. Next, we create block for function, create expression holding our function, clear local parameters, and we are done.

PreparedFunction looks as follows:

using System.Linq.Expressions;

namespace Compiler
{
    public class PreparedFunction
    {
        public Expression Function { get; set; }
        public string Name { get; set; }
    }
}

Function’s parameters are defined as:

using System.Linq.Expressions;

namespace Compiler
{
    public class Parameter : ISemanticObject
    {
        public string Name { get; set; }
        public Expression Accept(IVisitor visitor)
        {
            return Expression.Parameter(typeof (SimpleObject), Name);
        }

        public void FindLabels(IVisitor visitor)
        {
        }
    }
}

And return instruction:

using System.Linq.Expressions;

namespace Compiler
{
    public class Return : IStatement
    {
        public IExpression ReturnedValue { get; set; }

        public Expression Accept(IVisitor visitor)
        {
            if (ReturnedValue != null)
            {
                var value = ReturnedValue.Accept(visitor);
                return Expression.Return(visitor.ReturnLabel, value, typeof(SimpleObject));
            }
            Expression defaultValue = Expression.Call(typeof (YaulCompiler), "ConstructSimpleObject", null, new Expression[] { Expression.Constant("", typeof(string)) });
            return Expression.Return(visitor.ReturnLabel, defaultValue, typeof(SimpleObject));
        }

        public void FindLabels(IVisitor visitor)
        {
        }
    }
}

Please note, that we always return something, but in case when we want to exit the function, we simply ignore the value.

Anod now the last part, calling functions:

using System.Linq.Expressions;

namespace Compiler
{
    public class ProdecureCall : IStatement
    {
        public FunctionCall FunctionCall;

        public Expression Accept(IVisitor visitor)
        {
            return FunctionCall.Accept(visitor);
        }

        public void FindLabels(IVisitor visitor)
        {
        }
    }
}

using System.Collections.Generic;
using System.Linq;
using System.Linq.Expressions;

namespace Compiler
{
    public class FunctionCall : IExpression
    {
        public string Name { get; set; }
        public IList< IExpression> Arguments { get; set; }

        public Expression Accept(IVisitor visitor)
        {
            var function = visitor.Functions.SingleOrDefault(func => func.Name.Equals(Name));
            if (function != null)
            {
                var argumentsExpression = Arguments.Select(argument => argument.Accept(visitor));
                var lambda = function;
                return Expression.Invoke(lambda.Function, argumentsExpression);
            }

            throw new MethodNotFoundException(Name);
        }

        public void FindLabels(IVisitor visitor)
        {
        }
    }
}

Summary

Function handling is the most difficult part in our language, however, there is no magic. Wee simply need to write a little more code to correctly handle and create function’s block. Next time we are going to handle standard library.

]]>
https://blog.adamfurmanek.pl/2016/09/03/yet-another-useless-language-part-8/feed/ 1
Yet Another Useless Language Part 7 — Loop https://blog.adamfurmanek.pl/2016/08/27/yet-another-useless-language-part-7/ https://blog.adamfurmanek.pl/2016/08/27/yet-another-useless-language-part-7/#comments Sat, 27 Aug 2016 08:00:46 +0000 https://blog.adamfurmanek.pl/?p=1798 Continue reading Yet Another Useless Language Part 7 — Loop]]>

This is the seventh part of the YAUL series. For your convenience you can find other parts in the table of contents in Part 1 — Introduction

Last time we implemented conditional expression. Today we are going to implement another basic imperative concept — loop.

Introduction

We are going to handle while loop. We will also handle common keywords allowing to control execution: break, continue, and goto.

Parser

As usual, we start with parser. First, loop:

def p_while(p):
    """while : WHILE '(' cond_expr ')' block_or_statement"""
    p[0] = Compiler.WhileLoop()
    p[0].Condition = p[3]
    p[0].Body = p[5]

Looks almost the same as if code. We simply check condition and execute block of code. Let’s see what is exactly the block_or_statement:

def p_block_or_statement_block(p):
    """block_or_statement : block"""
    p[0] = p[1]

def p_block_or_statement_statement(p):
    """block_or_statement : statement"""
    p[0] = p[1]

def p_block_empty(p):
    """block : '{' '}' """
    p[0] = Compiler.StatementBlock()
    p[0].Statements = System.Collections.Generic.List[Compiler.IStatement]([])

def p_block_not_empty(p):
    """block : '{' list_statement '}' """
    p[0] = Compiler.StatementBlock()
    p[0].Statements = System.Collections.Generic.List[Compiler.IStatement](p[2])

def p_list_statement_first(p):
    """list_statement : statement"""
    p[0] = [p[1]]
    

def p_list_statement_next(p):
    """list_statement : list_statement statement"""
    p[0] = p[1]
    p[0].append(p[2])
    

def p_statement(p):
    """statement : setvar
                 | setarrayelem
                 | if_else
                 | while
                 | proc_call
                 | return
                 | break
                 | continue
                 | print 
                 | jump
                 | label """
    p[0] = p[1]

The code should be obvious. We simply handle list of statements, where each statement is one of language’s instructions. Let’s now handle keywords for controlling loop:

def p_break(p):
    """break : BREAK ';' """
    p[0] = Compiler.Break()

def p_continue(p):
    """continue : CONTINUE ';' """
    p[0] = Compiler.Continue()

Looks pretty easy, just a keyword with semicolon at the end. Let’s now see the labels and goto:

def t_LABEL(t):
    r'[a-zA-Z_][a-zA-Z0-9_]*!'
    t.value = t.value.strip('!')
    return t

def p_label(p):
    """ label : LABEL"""
    p[0] = Compiler.Label()
    p[0].Name = p[1]

def p_jump(p):
    """ jump : JUMP LABEL ';' """
    p[0] = Compiler.Jump()
    p[0].Target = Compiler.Label();
    p[0].Target.Name = p[2]

Once again, nothing special.

C#

Let’s dig into C# code. First, loop:

using System;
using System.Linq.Expressions;

namespace Compiler
{
    public class WhileLoop : IStatement
    {
        public IExpression Condition { get; set; }
        public StatementBlock Body { get; set; }

        public WhileLoop()
        {
            Body = new StatementBlock();
        }

        public Expression Accept(IVisitor visitor)
        {
            LabelTarget breakLabel = Expression.Label();
            LabelTarget continueLabel = Expression.Label();
            Expression breakExpression = Expression.Break(breakLabel);
            Expression continueExpression = Expression.Continue(continueLabel);

            visitor.BreakExpressions.Add(breakExpression);
            visitor.ContinueExpressions.Add(continueExpression);

            var result = Expression.Loop(
                Expression.IfThenElse(
                    Condition.Accept(visitor), 
                    Expression.Block(
                        Body.Accept(visitor),
                        continueExpression
                    ),
                    breakExpression
                ), 
                breakLabel, 
                continueLabel);


            visitor.BreakExpressions.Remove(breakExpression);
            visitor.ContinueExpressions.Remove(continueExpression);

            return result;
        }

        public void FindLabels(IVisitor visitor)
        {
            Body.FindLabels(visitor);
        }
    }
}

First, before visiting the body we create instructions for break and continue instructions. We add them to visitor, so it knows which instructions use to control the loop. Next, we create a loop — it consists of expression for checking condition, expression for loop’s block, and expressions for controlling the loop. Check MSDN for more details. We also need to handle labels in loop’s body.

Let’s now see blocks of code:

using System.Collections.Generic;
using System.Linq;
using System.Linq.Expressions;

namespace Compiler
{
    public class StatementBlock : ISemanticObject
    {
        public List< IStatement> Statements { get; set; }
        
        public Expression Accept(IVisitor visitor)
        {
            FindLabels(visitor);
            var generatedStatements = GenerateContent(visitor);
            
            if (generatedStatements.Count > 0)
            {
                return Expression.Block(generatedStatements);
            }

            return Expression.Empty();
        }

        private IList< Expression> GenerateContent(IVisitor visitor)
        {
            return Statements.Select(statement => statement.Accept(visitor)).Where(statement => statement != null).ToList();
        }

        public void FindLabels(IVisitor visitor)
        {
            foreach (var each in Statements)
            {
                each.FindLabels(visitor);
            }
        }
    }
}

No magic here. We simply iterate over statements and collect them into one block. Statement is:

namespace Compiler
{
    public interface IStatement : ISemanticObject
    {
        
    }
}

Well, it is very simple. Statements are just particular instructions and they are handled somewhere else, in implementations.

OK, let’s now see control keywords:

using System;
using System.Linq;
using System.Linq.Expressions;

namespace Compiler
{
    public class Break : IStatement
    {
        public Expression Accept(IVisitor visitor)
        {
            if (visitor.BreakExpressions.Count == 0)
            {
                throw new InvalidOperationException("Cannot break - must be inside loop");
            }

            return visitor.BreakExpressions.Last();
        }

        public void FindLabels(IVisitor visitor)
        {
        }
    }
}

using System;
using System.Linq;
using System.Linq.Expressions;

namespace Compiler
{
    public class Continue : IStatement
    {
        public Expression Accept(IVisitor visitor)
        {
            if (visitor.ContinueExpressions.Count == 0)
            {
                throw new InvalidOperationException("Cannot continue - must be inside loop");
            }

            return visitor.ContinueExpressions.Last();
        }

        public void FindLabels(IVisitor visitor)
        {
        }
    }
}

We simply extract most recent instructions from visitor or throw exceptions if they do not exist.

And now the last part, gotos:

using System.Linq.Expressions;

namespace Compiler
{
    public class Label : IStatement
    {
        public string Name { get; set; }
        public LabelTarget Target { get; set; }

        public Expression Accept(IVisitor visitor)
        {
            return Expression.Label(Target);
        }

        public void FindLabels(IVisitor visitor)
        {
            Target = Expression.Label(Name);
            visitor.Labels.Add(Target);
        }
    }
}

using System.Linq;
using System.Linq.Expressions;

namespace Compiler
{
    public class Jump : IStatement
    {
        public Label Target { get; set; }

        public Expression Accept(IVisitor visitor)
        {
            return Expression.Goto(visitor.Labels.First(label => label.Name == Target.Name));
        }

        public void FindLabels(IVisitor visitor)
        {
        }
    }
}

Since LINQ handles labels and jumps, this is pretty straightforward. We simply create label, store it in visitor, and jump to it.

Summary

We are now able to handle loops. Next time we will handle another big thing: functions.

]]>
https://blog.adamfurmanek.pl/2016/08/27/yet-another-useless-language-part-7/feed/ 1
Yet Another Useless Language Part 6 — If https://blog.adamfurmanek.pl/2016/08/20/yet-another-useless-language-part-6/ https://blog.adamfurmanek.pl/2016/08/20/yet-another-useless-language-part-6/#comments Sat, 20 Aug 2016 08:00:10 +0000 https://blog.adamfurmanek.pl/?p=1794 Continue reading Yet Another Useless Language Part 6 — If]]>

This is the sixth part of the YAUL series. For your convenience you can find other parts in the table of contents in Part 1 — Introduction

We continue developing custom language on .NET platform using LINQ expressions. Today we are going to implement conditional operator.

Grammar

We start with PLY grammar. Code for if is pretty simple:

def p_if_else_with_else(p):
    """if_else : IF '(' cond_expr ')' block_or_statement ELSE block_or_statement """
    p[0] = Compiler.If()
    p[0].Condition = p[3]
    p[0].TrueBlock = p[5]
    p[0].FalseBlock = p[7]

def p_if_else_without_else(p):
    """if_else : IF '(' cond_expr ')' block_or_statement"""
    p[0] = Compiler.If()
    p[0].Condition = p[3]
    p[0].TrueBlock = p[5]

We define if as keyword followed by condition in parenthesis, and statements. We need to handle matching else block. This part is a common source of ambiguity in parsers — imagine the following code:

if (something)
if (something2)
    print("Conditions met")
else
    print ("Condition not met")

Code is poorly formatted deliberately. We have two ifs and only one else — the question is: is the else clause matching first or second if? This problem occurs in most of today’s languages and is usually solved by assumption that the else matches closes if. So we should parse this code as:

if (something) {
    if (something2)
        print("Conditions met")
    else
        print ("Condition not met")
}

and not as:

if (something) {
    if (something2)
        print("Conditions met")
} else {
    print ("Condition not met")
}

In grammar’s terms this is known as reduce-reduce conflict.

Let’s move on. We need to parse conditions:

def p_cond_expr(p):
    """cond_expr : expr EQEQ expr
                 | expr GTEQ expr
                 | expr LSEQ expr
                 | expr NTEQ expr
                 | expr '>' expr
                 | expr '<' expr"""
    p[0] = Compiler.BinaryOperation()
    p[0].Left = p[1]
    p[0].Right = p[3]
    p[0].Sign = p[2]

This part is pretty easy. We handle common relational operators making a binary operation.

C# part

Binary operation was described in previous part. Now code for if:

using System.Linq.Expressions;

namespace Compiler
{
    public class If : IStatement
    {
        public IExpression Condition { get; set; }
        public StatementBlock TrueBlock { get; set; }
        public StatementBlock FalseBlock { get; set; }

        public If()
        {
            TrueBlock = new StatementBlock();
            FalseBlock = new StatementBlock();
        }

        public Expression Accept(IVisitor visitor)
        {
            if (FalseBlock.Statements != null)
            {
                return Expression.IfThenElse(Condition.Accept(visitor), TrueBlock.Accept(visitor),
                                             FalseBlock.Accept(visitor));
            }
            return Expression.IfThen(Condition.Accept(visitor), TrueBlock.Accept(visitor));
        }

        public void FindLabels(IVisitor visitor)
        {
            TrueBlock.FindLabels(visitor);
            if (FalseBlock.Statements != null)
            {
                FalseBlock.FindLabels(visitor);
            }
        }
    }
}

Nothing difficult here. We have only two cases to handle, if with matching else or without. We also need to recursively find labels in blocks.

Summary

We are now able to handle conditions in YAUL. Next time we will examine basic loop concept.

]]>
https://blog.adamfurmanek.pl/2016/08/20/yet-another-useless-language-part-6/feed/ 1
Yet Another Useless Language Part 5 — Variables https://blog.adamfurmanek.pl/2016/08/13/yet-another-useless-language-part-5/ https://blog.adamfurmanek.pl/2016/08/13/yet-another-useless-language-part-5/#comments Sat, 13 Aug 2016 08:00:58 +0000 https://blog.adamfurmanek.pl/?p=1781 Continue reading Yet Another Useless Language Part 5 — Variables]]>

This is the fifth part of the YAUL series. For your convenience you can find other parts in the table of contents in Part 1 — Introduction

Hi! Today we are going to add variables support to our language.

Introduction

In previous parts we saw implementation of variable type for storing values and performing operations. We also saw the grammar for YAUL which allows us to declare variables, create arrays, and perform operations on values. We do not need to declare variables explicitly, however, we need to assign them any value before they are used in subsequent operations.

Before we dig into C# code, let’s see the Python code for parsing operations.

PLY

First, we start with defining lexems.

Tokens

Before we do anything in our language, let’s define symbols, operator precedence, and compound operators:

literals = [':', ';', ',', '(', ')', '[', ']', '{', '}', 
            '=', '*', '+', '-', '/', '%', '>', '<', '!']

precedence = (
        ("left", '+', '-'),
        ("left", '*', '/', '%')   
    )

First, we define all allowed characters in our source code. Any other character will be ignored silently. Next, we define operators’ precedence. We will use it when it comes to operations on variables.

Next, let’s define keywords and operators:

keywords = {
    'print'     : 'PRINT',
    'function'  : 'FUNCTION',
    'if'        : 'IF',
    'else'      : 'ELSE',
    'while'     : 'WHILE',
    'return'    : 'RETURN',
    'new'       : 'NEW',
    'continue'  : 'CONTINUE',
    'break'     : 'BREAK',
    'jump'      : 'JUMP',
}

tokens = ['EQEQ',
          'GTEQ',
          'LSEQ',
          'NUMBER',
          'STRING',
          'IDENT',
          'LABEL'
          ] + list(keywords.values())

t_EQEQ = r"=="
t_GTEQ = r">="
t_LSEQ = r"<="

First, we define a list of keywords. You can see that we have keyword for all common constructions, we also have custom keyword for printing variables. We also define how to match operators consisting of two characters.

Let’s also ignore comments in the source code:

t_ignore_LINE_COMMENT = r'//.*'
t_ignore_BLOCK_COMMENT = r'/\*((.|\n)*?)\*/'
t_ignore = ' \t'

We should also handle line numbers and errors in order to present better error descriptions:

def t_newline(t):
    r'\n+'
    t.lexer.lineno += len(t.value)
	
errors = []

def t_error(t):
    global errors
    errors.append("Line {0:3}:\tIllegal character '{1}'".format(t.lexer.lineno, t.value[0]))
    t.lexer.skip(1)

Literals

We have only two primitives in our language: numbers (integers) and strings. So let’s handle them with PLY:

def t_STRING(t):
    r'".*?"'
    t.value = t.value.strip(r'"');
    return t

def t_NUMBER(t):
    r'\d+'
    t.value = int(t.value)
    return t

String is basically anything delimited with double quotation marks. Notice that we have non-greedy match in order to not catch too many characters. Numbers are just bunch of digits, nothing more. We do not handle real numbers.

OK, we can handle literals.

Variables

Let’s now handle identifiers for variables:

def t_IDENT(t):
    r'[a-zA-Z_][a-zA-Z0-9_]*'
    t.type = keywords.get(t.value,'IDENT')
    return t

Identifier is anything starting with letter or underscore, followed by any number of letters, digits, and underscores. We parse the lexem and store its type.

OK, we are able to match literals and identifiers. Let’s write code to store value in the variable:

def p_setvar_assign_variable(p):
    """setvar : IDENT '=' expr ';'"""
    p[0] = Compiler.Assignment()
    p[0].VariableName = p[1]
    p[0].ValueExpression = p[3]

Here we match expressions in form variable = literal. We can match any expression as right side of assignment, and store data in Assignment object to utilize it later. Expression is allowed to have the following form:

def p_expr_arithmetic(p):
    """expr : expr '+' expr
            | expr '-' expr
            | expr '*' expr
            | expr '/' expr
            | expr '%' expr"""
    p[0] = Compiler.BinaryOperation()
    p[0].Left = p[1]
    p[0].Right = p[3]
    p[0].Sign = p[2]

def p_expr_others(p):
    """expr : funct_call
            | literal
            | variable
            | arrayelem
            | arraydef
            | emptyarray"""

    p[0] = p[1]

First, we handle operators like addition or multiplication. We parse them and store as BinaryOperation which we will examine in next parts of this series. We can also handle different things as expressions: function calls, literals, other variables, array elements, array definitions, and empty arrays. We will cover most of them in next parts. For now let’s focus on few of them.

First, literals. We assume that literal is string or number:

def p_literal_string(p):
    """literal : STRING
               | NUMBER """
    p[0] = Compiler.ConstantExpression()
    p[0].Value = p[1]

Since we will handle types in C# code, we do not need to handle them explicitly, we just store the value.

Next, let’s handle assigning one variable to another:

def p_variable(p):
    """variable : IDENT """
    p[0] = Compiler.VariableDereference()
    p[0].Name = p[1]

We treat variable as VariableDereference. In the C# counterpart we will need to check whether the variable is already defined. If it is so — we can perform assignment.

Arrays

When it comes to arrays, we can create empty array with specified size:

def p_emptyarray(p):
    """ emptyarray : NEW '[' NUMBER ']' """
    p[0] = Compiler.ConstantArray()
    p[0].Value = Array[Compiler.SimpleObject]([Compiler.SimpleObject(0) for n in range(p[3])])

We create new array and initialize its elements to zeros. We can also define array as list of expressions:

def p_arraydef(p):
    """ arraydef : '[' list_expr ']' """
    p[0] = Compiler.ConstantArray()
    p[0].Value = Array[Compiler.SimpleObject]([Compiler.SimpleObject(n) for n in p[2]])

def p_list_expr_first(p):
    """list_expr : expr"""
    p[0] = [p[1]]

def p_list_expr_next(p):
    """list_expr : list_expr ',' expr"""
    p[0] = p[1]
    p[0].append(p[3])

We match two brackets and extract all values separated with commas. We can also extract element from array:

def p_arrayelem(p):
    """ arrayelem : IDENT '[' expr ']' """
    p[0] = Compiler.ArrayAccess()
    p[0].Name = p[1]
    p[0].Index = p[3]

And we can assign value to array’s element:

def p_setarrayelem_variable(p):
    """setarrayelem : IDENT '[' expr ']' '=' expr ';'"""
    p[0] = Compiler.ArrayAssignment()
    p[0].VariableName = p[1]
    p[0].ValueExpression = p[6]
    p[0].Index = p[3]

That’s it when it comes to PLY code. Most of it is rather straightforward, we simply extract tokens from source code and store them in custom classes in order to handle them later. Nothing fancy here.

C#

Now it is time to write C# code to handle values.

Basic variables assignment

Let’s start with assigning variables:

using System.Linq;
using System.Linq.Expressions;

namespace Compiler
{
    public class Assignment : IStatement
    {
        public string VariableName { get; set; }
        public IExpression ValueExpression { get; set; }

        public Expression Accept(IVisitor visitor)
        {
            Variable dereferencedVariable = visitor.LocalVariables.SingleOrDefault(variable => variable.Name.Equals(VariableName));
            if (dereferencedVariable == null)
            {
                dereferencedVariable = new Variable {
                    Name = VariableName,
                    VariableReference = Expression.Variable(typeof (SimpleObject), VariableName)
                };

                visitor.LocalVariables.Add(dereferencedVariable);
            }

            return Expression.Assign(dereferencedVariable.VariableReference, ValueExpression.Accept(visitor));
        }

        public void FindLabels(IVisitor visitor)
        {
        }
    }
}

As we saw in previous part, we store all variables in visitor object. First, we try to find variable by name — if there is no such a variable, we simply create it in lines 16-19. Please also notice Expression.Variable in line 18 — this is the code which creates actual variable using lambdas. This lambda will be translated into ordinary variable definition of type SimpleObject with specified name, so this will be something like: SimpleObject name;.

Next, in line 24 we assign value to variable. We pass our visitor to ValueExpression, so the value will be calculated in runtime (if it is needed).

Assignment defines no labels, so FindLables simply does nothing. Please remember, that we use labels to perform gotos, we will handle them in next parts.

To sump up: this C# code creates new variable if it needed and assigns value to it. Variable is created in two places: one place is our local list of variables, so we can track them and handle them correctly, second place is lambda. The latter place is the place where all magic happens and the variable is actually created. Assignment is handled in only one place and is performed in runtime — whether it is translated to assigning constant or function call is up to ValueExpression.Accept result.

Please also notice, that we do not have scopes. All variables are stored in one collection, we do not verify whether these variables are properly scoped. We could do it here, but let’s not bother with it for now.

OK, let’s move on. We can assign constants to variables, so let’s see the C# code:

using System.Linq.Expressions;

namespace Compiler
{
    public class ConstantExpression : IExpression
    {
        public object Value { get; set; }

        public Expression Accept(IVisitor visitor)
        {
            System.Linq.Expressions.ConstantExpression param = Expression.Constant(Value, Value.GetType());
            return Expression.Call(typeof (YaulCompiler), "ConstructSimpleObject", null, new Expression[] { param });
        }

        public void FindLabels(IVisitor visitor)
        {
        }
    }
}

No magic here — since we have parsed value from PLY, we can simply create a lambda representing constant (line 11) with correct type (determined at runtime by examining the value, it is not done by the parser, however, it could be). Next, we return a lambda representing call to static function of YaulCompiler which simply creates new object with value and returns it. We could replace this lambda will direct call to SimpleObject constructor, however, using helper function is better, since we have exactly one place of variables creation.

Let’s now see how we can assign one variable to another. In order to do that, we need to dereference existing variable. Here is the code:

using System.Linq;
using System.Linq.Expressions;

namespace Compiler
{
    public class VariableDereference : IExpression
    {
        public string Name { get; set; }
        
        public Expression Accept(IVisitor visitor)
        {
            var variable = visitor.LocalVariables.FirstOrDefault(v => v.Name.Equals(Name));
            if (variable != null)
            {
                return variable.VariableReference;
            }
            throw new VariableNotInitializedException(Name);
        }

        public void FindLabels(IVisitor visitor)
        {
        }
    }
}

First, we traverse list of variables and try to find variable matching by name. If we found one, we return it — in other case we throw an exception describing the problem.

So now we are able to assign literal to variable and variable to variable. Let’s handle some more sophisticated scenarios.

Arrays

Let’s start with empty array:

using System.Linq.Expressions;

namespace Compiler
{
    public class ConstantArray : IExpression
    {
        public object Value { get; set; }

        public Expression Accept(IVisitor visitor)
        {
            return Expression.Constant(new SimpleObject(Value));
        }

        public void FindLabels(IVisitor visitor)
        {
        } 
    }
}

We created an actual array of values in Python code, so now we only need to create a constant representing the object. Please notice, that Expression.Constant means a predefined value, not a constant like const int. We don’t care whether this array is with predefined size or created from expression list — we have all the values provided by PLY and we can create the array.

When it comes to accessing array elements, we have the following code:

using System.Linq;
using System.Linq.Expressions;

namespace Compiler
{
    public class ArrayAccess : IExpression
    {
        public string Name { get; set; }
        public IExpression Index { get; set; }

        public Expression Accept(IVisitor visitor)
        {
            var argument = Index.Accept(visitor);
            return Expression.Call(typeof (SimpleObject), "GetElement", null, new Expression[]{
                visitor.LocalVariables.Single(variable => variable.Name.Equals(Name))
                        .VariableReference,
                argument
            });
        }

        public void FindLabels(IVisitor visitor)
        {
        }
    }
}

In line 13 we extract the index of element. Since the index might be a value calculated in runtime, we simply transform it to lambdas and call helper method from SimpleObject (line 14). Please also notice that we don’t check whether we have variable defined or not — in the latter situation we will throw an exception. Here you can see why it is a good idea to have custom exception types — getting NullReferenceException explains much less than custom type.

Let’s now see the code for changing array’s element:

using System.Linq;
using System.Linq.Expressions;

namespace Compiler
{
    public class ArrayAssignment : IStatement
    {
        public string VariableName { get; set; }
        public IExpression ValueExpression { get; set; }
        public IExpression Index { get; set; }

        public Expression Accept(IVisitor visitor)
        {
            Variable dereferencedVariable = visitor.LocalVariables.FirstOrDefault(variable => variable.Name.Equals(VariableName));

            if (dereferencedVariable == null)
            {
                throw new VariableNotInitializedException(VariableName);
            }

            return Expression.Call(typeof(SimpleObject), "SetElement", null, new Expression[] {
                dereferencedVariable.VariableReference,
                Index.Accept(visitor),
                ValueExpression.Accept(visitor)
            });
        }

        public void FindLabels(IVisitor visitor)
        {
        }
    }
}

First, we look for variable. If we can’t find it — we throw. Next, we call helper method SetElement and pass all required arguments — array variable, array index, and value for element.

Arithmetic

There is one more thing which we can cover today — binary operations. In PLY we defined a function for parsing binary operations like addition, here is C# code for performing actual calculations:

using System;
using System.Linq.Expressions;

namespace Compiler
{
    public class BinaryOperation : IExpression
    {
        public IExpression Left { get; set; }
        public IExpression Right { get; set; }
        public string Sign { get; set; }

        public Expression Accept(IVisitor visitor)
        {
            var leftSide = Left.Accept(visitor);
            var rightSide = Right.Accept(visitor);

            switch (Sign)
            {
                case "+":
                    return Expression.Add(leftSide, rightSide);
                case "-":
                    return Expression.Subtract(leftSide, rightSide);
                case "*":
                    return Expression.Multiply(leftSide, rightSide);
                case "/":
                    return Expression.Divide(leftSide, rightSide);
                case "%":
                    return Expression.Modulo(leftSide, rightSide);
                case "<":
                    return Expression.LessThan(leftSide, rightSide);
                case "<=":
                    return Expression.LessThanOrEqual(leftSide, rightSide);
                case ">":
                    return Expression.GreaterThan(leftSide, rightSide);
                case ">=":
                    return Expression.GreaterThanOrEqual(leftSide, rightSide);
                case "==":
                    return Expression.Equal(leftSide, rightSide);
                case "!=":
                    return Expression.NotEqual(leftSide, rightSide);
                default:
                    throw new InvalidOperationException(string.Format("Incorrect operation sign: {0}", Sign));
            }
        }

        public void FindLabels(IVisitor visitor)
        {
        }
    }
}

A bit more code than in other snippets, however, there is nothing fancy here. We simply try to match operator and create correct lambda for performing the operation. You can also notice, that we handle much more operators than in PLY (namely, comparison operators) — we will explain them in next parts.

Summary

We are now able to assign values to variables. In next part we are going to examine if construct.

]]>
https://blog.adamfurmanek.pl/2016/08/13/yet-another-useless-language-part-5/feed/ 1
Yet Another Useless Language Part 4 — Compiler https://blog.adamfurmanek.pl/2016/08/06/yet-another-useless-language-part-4/ https://blog.adamfurmanek.pl/2016/08/06/yet-another-useless-language-part-4/#comments Sat, 06 Aug 2016 08:00:46 +0000 https://blog.adamfurmanek.pl/?p=1776 Continue reading Yet Another Useless Language Part 4 — Compiler]]>

This is the fourth part of the YAUL series. For your convenience you can find other parts in the table of contents in Part 1 — Introduction

Today we are going to take a look at the basics of compilation process.

AST

In order to execute code we are going to parse source code using PLY and generate AST. Next, we are going to parse this AST and generate lambdas. Let’s take the following code:

function isPrime(number){
	temp = 2;
	while(temp < number){
		if(number % temp == 0){
			return 0;
		}
		temp = temp + 1;
		if(temp >= number){
			break;
		}
	}

	return 1;
}

function checkPrimes(low, up){
	temp = low;
	while(temp <= up){
		if(isPrime(temp) == 1){
			print temp;
		}
		temp = temp + 1;
		continue;
	}
}



function doWork(){
	checkPrimes(2,100);
	jump target!;
target!
	return;
}


doWork();

This code does nothing important, however, it uses almost all features of our language. The AST for this application is:

FunctionStatementBlock = {
	"Statements" : [
		ProcedureCall = {
			"FunctionCall" : FunctionCall = {
				"Name" : "doWork",
				"Arguments" : []
			}
		}
	],
	"Functions" : [
		FunctionDefinition = {
			"Body" : StatementBlock = {
				"Statements" : [
					Assignment = {
						"VariableName" : "temp",
						"ValueExpression" : ConstantExpression = {
							"Value" : 2
						}
					}, 
					WhileLoop = {
						"Condition" : BinaryOperation = {
							"Left" : VariableDereference = {
								"Name" : "temp"
							},
							"Right" : VariableDereference = {
								"Name" : "number"
							},
							"Sign" : "<"
						},
						"Body" : StatementBlock = {
							"Statements" : [
								If = {
									"Condition" : BinaryOperation = {
										"Left" : BinaryOperation = {
											"Left" : VariableDereference = {
												"Name" : "number"
											},
											"Right" : VariableDereference = {
												"Name" : "temp"
											},
											"Sign" : "%"
										},
										"Right" : ConstantExpression = {
											"Value" : 0
										},
										"Sign" : "=="
									},
									"TrueBlock" : StatementBlock = {
										"Statements" : [
											Return = {
												"ReturnedValue" : {
													"Value" : 0
												}
											}
										]
									},
									"FalseBlock" : StatementBlock = {
										"Statements" : null
									}
								}, 
								Assignment = {
									"VariableName" : "temp",
									"ValueExpression" : BinaryOperation = {
										"Left" : VariableDereference = {
											"Name" : "temp"
										},
										"Right" : ConstantExpression = {
											"Value" : 1
										},
										"Sign" : "+"
									}
								}, 
								If = {
									"Condition" : BinaryOperation = {
										"Left" : VariableDereference = {
											"Name" : "temp"
										},
										"Right" : VariableDereference = {
											"Name" : "number"
										},
										"Sign" : ">="
									},
									"TrueBlock" : StatementBlock = {
										"Statements" : [
											CompilerBreak = {
											}
										]
									},
									"FalseBlock" : StatementBlock = {
										"Statements" : null
									}
								}
							]
						}
					}, 
					Return = {
						"ReturnedValue" : {
							"Value" : 1
						}
					}
				]
			},
			"Parameters" : [
				Parameter = {
					"Name" : "number"
				}
			],
			"Name" : "isPrime"
		}, 
		FunctionDefinition = {
			"Body" : StatementBlock = {
				"Statements" : [
					Assignment = {
						"VariableName" : "temp",
						"ValueExpression" : VariableDereference = {
							"Name" : "low"
						}
					}, 
					WhileLoop = {
						"Condition" : BinaryOperation = {
							"Left" : VariableDereference = {
								"Name" : "temp"
							},
							"Right" : VariableDereference = {
								"Name" : "up"
							},
							"Sign" : "<="
						},
						"Body" : StatementBlock = {
							"Statements" : [
								If = {
									"Condition" : BinaryOperation = {
										"Left" : FunctionCall = {
											"Name" : "isPrime",
											"Arguments" : [
												VariableDereference = {
													"Name" : "temp"
												}
											]
										},
										"Right" : ConstantExpression = {
											"Value" : 1
										},
										"Sign" : "=="
									},
									"TrueBlock" : StatementBlock = {
										"Statements" : [
											ProcedureCall = {
												"FunctionCall" : FunctionCall = {
													"Name" : "_print",
													"Arguments" : [
														VariableDereference = {
															"Name" : "temp"
														}
													]
												}
											}
										]
									},
									"FalseBlock" : StatementBlock = {
										"Statements" : null
									}
								}, 
								Assignment = {
									"VariableName" : "temp",
									"ValueExpression" : BinaryOperation {
										"Left" : VariableDereference = {
											"Name" : "temp"
										},
										"Right" : ConstantExpression = {
											"Value" : 1
										},
										"Sign" : "+"
									}
								}, 
								Continue = {}
							]
						}
					}
				]
			},
			"Parameters" : [
				Parameter = {
					"Name" : "low"
				}, 
				Parameter = {
					"Name" : "up"
				}
			],
			"Name" : "checkPrimes"
		}, 
		FunctionDefinition = {
			"Body" : StatementBlock = {
				"Statements" : [
					ProcedureCall = {
						"FunctionCall" : FunctionCall = {
							"Name" : "checkPrimes",
							"Arguments" : [
								ConstantExpression = {
									"Value" : 2
								}, 
								ConstantExpression = {
									"Value" : 100
								}
							]
						}
					}, 
					Jump = {
						"Target" : Label = {
							"Name" : "target",
							"Target" : null
						}
					}, 
					Label = {
						"Name" : "target",
						"Target" : null
					}, 
					Return = {
						"ReturnedValue" : null
					}
				]
			},
			"Parameters" : [],
			"Name" : "doWork"
		}
	]
}

Lots of code for such a simple application. However, if you try to decipher this structure you will see that it indeed represents whole application.

We will generate this AST using PLY. Having this AST we could easily try to execute it by writing custom processor — we would need to track instruction pointer, variables’ states, call stacks and all that stuff. But in order to generate lambdas we will use visitor pattern to traverse the tree and generate the code.

Compiler application

Let’s write some C# code. We start with the following:

using Compiler;
using System.IO;

namespace YetAnotherUselessLanguage
{
    class Program
    {
        static void Main(string[] args)
        {
            var parser = new Parser();
            var tree = parser.Parse(File.ReadAllText(args[0]));
            var scriptExpression = YaulCompiler.CreateExpressionFromAST(tree);
            YaulCompiler.ExecuteScript(scriptExpression);
        }
    }
}

This simple application accepts one command line argument — the path to the source code file. After reading the filwe, we call Parser‘s method to parse the code (we will implement Parser later in this series), next we call Compiler‘s methods to create lambdas and execute the script. Here is the Compiler class:

using System;
using System.Linq.Expressions;
using System.Reflection;
using System.Reflection.Emit;

namespace Compiler
{
    public static class YaulCompiler
    {
        public static Expression CreateExpressionFromAST(FunctionStatementBlock astTree)
        {
            return astTree.Accept(new ExpressionGenerator());
        }

        public static void ExecuteScript(Expression expression)
        {
            Expression.Lambda< Action>(expression).Compile()();
        }

        public static SimpleObject ConstructSimpleObject(int value)
        {
            return new SimpleObject(value);
        }

        public static SimpleObject ConstructSimpleObject(string value)
        {
            return new SimpleObject(value);
        }
    }
}

Pretty simple. We create ExpressionGenerator and execute Accept method. ExpressionGenerator visitor is pretty simple:

using System.Collections.Generic;
using System.Linq.Expressions;
using LExpression = System.Linq.Expressions.Expression;

namespace Compiler
{
    public interface IVisitor
    {
        IList< Variable> LocalVariables { get; set; }
        
        IList< PreparedFunction> Functions { get; set; }

        IList< LExpression> BreakExpressions { get; }
        IList< LExpression> ContinueExpressions { get; }
        IList< LabelTarget> Labels { get; set; }

        LabelTarget ReturnLabel { get; set; }
    }
}

using System.Collections.Generic;
using System.Linq.Expressions;
using LExpression = System.Linq.Expressions.Expression;

namespace Compiler
{

    class ExpressionGenerator : IVisitor
    {
        public IList< Variable> LocalVariables { get; set; }
        public IList< PreparedFunction> Functions { get; set; }
        public IList< LExpression> BreakExpressions { get; private set; }
        public IList< LExpression> ContinueExpressions { get; private set; }
        public IList< LabelTarget> Labels { get; set; }
        public LabelTarget ReturnLabel { get; set; }

        public ExpressionGenerator()
        {
            LocalVariables = new List< Variable>();
            LocalVariables = new List< Variable>();
            BreakExpressions = new List< LExpression>();
            ContinueExpressions = new List< LExpression>();
            Labels = new List< LabelTarget>();
        }
    }
}

Our visitor will be used to hold some of parsed instructions. We will use it to maintain local variables, nested blocks, labels etc.

Next time we are going to handle variables declarations and assignments.

]]>
https://blog.adamfurmanek.pl/2016/08/06/yet-another-useless-language-part-4/feed/ 1
Yet Another Useless Language Part 3 — Variables https://blog.adamfurmanek.pl/2016/07/30/yet-another-useless-language-part-3/ https://blog.adamfurmanek.pl/2016/07/30/yet-another-useless-language-part-3/#comments Sat, 30 Jul 2016 08:00:29 +0000 https://blog.adamfurmanek.pl/?p=1769 Continue reading Yet Another Useless Language Part 3 — Variables]]>

This is the third part of the YAUL series. For your convenience you can find other parts in the table of contents in Part 1 — Introduction

Today we will implement class for holding values in YAUL language

Requirements

We do not have static typing, so every variable will need to be able to store three different types of values: integers, strings, and arrays. This means that all operations will need to check what type of variable they are working with — adding numbers will be different than adding strings. So let’s begin.

using System;
using System.Linq;

namespace Compiler
{
    public class SimpleObject
    {
        public object Value { get; set; }
        public Type ValueType { get { return Value.GetType(); } }

        public SimpleObject(object value)
        {
            var expression = value as ConstantExpression;
            Value = expression != null ? expression.Value : value;
        }

        public SimpleObject(SimpleObject other)
        {
            if(IsOfType< SimpleObject[]>(other))
            {
                Value = ValueOfType< SimpleObject[]>(other).Select(x => x).ToArray();
            }else if (IsOfType< string>(other))
            {
                Value = ValueOfType< string>(other).Substring(0);
            }
            else
            {
                Value = other.Value;
            }
        }

        public static bool AreOfType< T>(SimpleObject first, SimpleObject second)
        {
            return IsOfType< T>(first) && IsOfType< T>(second);
        }

        public static bool IsOfType< T>(SimpleObject @object)
        {
            return @object.ValueType == typeof (T);
        }

        public static T ValueOfType< T>(SimpleObject @object)
        {
            return ((T) @object.Value);
        }
	}
}

We have one field representing the value. We also have copy constructor — its implementation is in fact overly complicated (because we don’t need to explicitly copy integers or strings), but it shows how we would like to handle different types of value. We also have helper functions for checking types and values. We could simplify them, but let it be for now.

Accessing array

This is the code for accessing array elements:

public SimpleObject this[int key]
{
	get
	{
		if (IsOfType< SimpleObject[]>(this))
		{
			return ValueOfType< SimpleObject[]>(this)[key];
		}
		if (IsOfType< string>(this))
		{
			return new SimpleObject(ValueOfType< string>(this)[key].ToString());
		}

		throw new InvalidOperationException("Object is not an array");
	}
	set
	{
		if (IsOfType< SimpleObject[]>(this))
		{
			ValueOfType< SimpleObject[]>(this)[key] = value;
		}
		else if (IsOfType< string>(this))
		{
			var array = ValueOfType< string>(this).ToCharArray();
			array[key] = ((string) value.Value)[0];
			Value = new string(array);
		}else
		{
			throw new InvalidOperationException("Object is not an array");
		}
	}
}

We explicitly check types when accessing values. For arrays we extract value by element index, for strings we extract specified character. We do not handle more sophisticated cases here, like surruogate pairs.

We also define methods to access elements using SimpleObject as an index variable:

public static SimpleObject GetElement(SimpleObject target, SimpleObject index)
{
	return target[(int)index.Value];
}

public static void SetElement(SimpleObject target, SimpleObject index, SimpleObject value)
{
	target[(int)index.Value] = value;
}

Basic operators

Let’s now implement basic functions:

public static SimpleObject operator +(SimpleObject a, SimpleObject b)
{
	if (AreOfType< string>(a, b))
	{
		return new SimpleObject(ValueOfType< string>(a) + ValueOfType< string>(b));
	}
	
	if (AreOfType< int>(a, b))
	{
		return new SimpleObject(ValueOfType< int>(a) + ValueOfType< int>(b));
	}
	
	if (AreOfType< SimpleObject[]>(a,b))
	{
		return new SimpleObject(ValueOfType< SimpleObject[]>(a).Concat(ValueOfType< SimpleObject[]>(b)).ToArray());
	}

	throw new IncorrectTypesException(a, b);
}

public static SimpleObject operator -(SimpleObject a, SimpleObject b)
{
	if (AreOfType< int>(a, b))
	{
		return new SimpleObject(ValueOfType< int>(a) - ValueOfType< int>(b));
	}

	throw new IncorrectTypesException(a, b);
}

public static SimpleObject operator *(SimpleObject a, SimpleObject b)
{
	if (AreOfType< int>(a, b))
	{
		return new SimpleObject(ValueOfType< int>(a) * ValueOfType< int>(b));
	}
	throw new IncorrectTypesException(a, b);
}

public static SimpleObject operator /(SimpleObject a, SimpleObject b)
{
	if (AreOfType< int>(a, b))
	{
		return new SimpleObject(ValueOfType< int>(a) / ValueOfType< int>(b));
	}

	throw new IncorrectTypesException(a, b);
}

public static SimpleObject operator %(SimpleObject a, SimpleObject b)
{
	if (AreOfType< int>(a, b))
	{
		return new SimpleObject(ValueOfType< int>(a) % ValueOfType< int>(b));
	}

	throw new IncorrectTypesException(a, b);
}

All of them are rather trivial. Addition for numbers works as usual, for strings it joins them, for arrays it catenates them. Other operators works only for numbers.

Comparisons

Comparison operators are pretty trivial:

public static bool operator <(SimpleObject a, SimpleObject b)
{
	if (AreOfType< string>(a, b))
	{
		return ValueOfType< string>(a).CompareTo(ValueOfType< string>(b)) <  0;
	}

	if (AreOfType< int>(a, b))
	{
		return ValueOfType< int>(a) < ValueOfType< int>(b);
	}

	if (AreOfType< SimpleObject[]>(a, b))
	{
		var first = ValueOfType< SimpleObject[]>(a);
		var second = ValueOfType< SimpleObject[]>(b);
		if (first.Length != second.Length)
		{
			return first.Length < second.Length;
		}

		for (int i = 0; i < first.Length; ++i)
		{
			if (first[i] != second[i])
			{
				return first[i] < second[i];
			}
		}

		return false;
	}

	throw new IncorrectTypesException(a, b);
}

public static bool operator >(SimpleObject a, SimpleObject b)
{
	return false == (a < b) && (a != b);
}

public static bool operator <=(SimpleObject a, SimpleObject b)
{
	return a < b || a == b;
}

public static bool operator >=(SimpleObject a, SimpleObject b)
{
	return false == (a < b);
}

public static bool operator ==(SimpleObject a, SimpleObject b)
{
	if (AreOfType< string>(a, b))
	{
		return ValueOfType< string>(a) == ValueOfType< string>(b);
	}

	if (AreOfType< int>(a, b))
	{
		return (int)a.Value == (int)b.Value;
	}

	if (AreOfType< SimpleObject[]>(a, b))
	{
		return ValueOfType< SimpleObject[]>(a).SequenceEqual(ValueOfType< SimpleObject[]>(b));
	}

	return false;
}

public static bool operator !=(SimpleObject a, SimpleObject b)
{
	return false == (a == b);
}

Numbers and strings are compared using .NET functions. For arrays we first compare their length, and then we compare their contents.

True or false

We can also convert objects to boolean values:

public static bool operator true(SimpleObject @object)
{
	return IsTrue(@object);
}

private static bool IsTrue(SimpleObject @object)
{
	if (IsOfType< int>(@object))
	{
		return ValueOfType< int>(@object) != 0;
	}

	if (IsOfType< string>(@object))
	{
		return false == string.IsNullOrEmpty(ValueOfType< string>(@object));
	}

	if (IsOfType< SimpleObject[]>(@object))
	{
		return (ValueOfType< SimpleObject[]>(@object)).Length != 0;
	}

	return true;
}

public static bool operator false(SimpleObject @object)
{
	return false == IsTrue(@object);
}

public static implicit operator bool(SimpleObject @object)
{
	return IsTrue(@object);
}

public static bool operator !(SimpleObject @object)
{
	return false == IsTrue(@object);
}

Number is true only when it is not zero. String is true when it is not null or empty. Array is true when it is not empty.

Equality

For completeness we implement equality operators:

protected bool Equals(SimpleObject other)
{
	return AreOfType< SimpleObject[]>(this, other)
			   ? (ValueOfType< SimpleObject[]>(this).SequenceEqual(ValueOfType< SimpleObject[]>(other)))
			   : Value.Equals(other.Value);
}

public override bool Equals(object obj)
{
	if (ReferenceEquals(null, obj)) return false;
	if (ReferenceEquals(this, obj)) return true;
	if (obj.GetType() != GetType()) return false;
	return Equals((SimpleObject)obj);
}

public override int GetHashCode()
{
	return (Value != null ? Value.GetHashCode() : 0);
}

And we are done. To be sure that we implemented everything correctly we should implement tests.

Summary

Now we are able to perform basic operations on our values. Since every variable will be represented using just this one type, we don’t need to handle different representation cases and implement casting.

]]>
https://blog.adamfurmanek.pl/2016/07/30/yet-another-useless-language-part-3/feed/ 1
Yet Another Useless Language Part 2 — Grammar https://blog.adamfurmanek.pl/2016/07/23/yet-another-useless-language-part-2/ https://blog.adamfurmanek.pl/2016/07/23/yet-another-useless-language-part-2/#comments Sat, 23 Jul 2016 08:00:11 +0000 https://blog.adamfurmanek.pl/?p=1765 Continue reading Yet Another Useless Language Part 2 — Grammar]]>

This is the second part of the YAUL series. For your convenience you can find other parts in the table of contents in Part 1 — Introduction

Last time we described features of a language we are going to write. Today we are going to define its grammar using EBNF-like notation.

Notation

We will describe the notation using the following syntax:

  • Elements can be written in any case
  • Optional elements are written in square brackets: [Optional]
  • One or more elements are written in curly brackets: {one_or_more}
  • Literals are written in quotation marks: 'literal'
  • Literals are case insensitive
  • Question marks indicates parts described in natural language

Identifiers

We start with defining identifiers. We will handle only latin letters, digits, and underscores. Variable’s name will need to start with letter or underscore:

IDENT = letter_or_underscore , [ { letter_or_underscore_or_digit } ] ;
letter_or_underscore = letter | underscore ;
letter_or_underscore_or_digit = letter_or_underscore | digit;
underscore = '_' ;
digit = '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' ;
letter = 'a' | 'b' | 'c' | 'd' | 'e' | 'f' | 'g' | 'h' | 'I' | 'j' | 'k' | 'l' | 'm' | 'n' | 'o' | 'p' | 'q' | 'r' | 's' | 't' | 'u' | 'v' | 'w' | 'x' | 'y' | 'z' | 'A' | 'B' | 'C' | 'D' | 'E' | 'F' | 'G' | 'H' | 'I' | 'J' | 'K' | 'L' | ' M' | 'N' | 'O' | 'P' |' Q' | 'R '| 'S' | 'T' | 'U' | 'V' | 'W' | 'X' | 'Y' | 'Z' ;

We can see that IDENT is a letter_or_underscore, optionally followed by letters, digits, or underscores. We also specified all possible letters and digits we can handle.

Trivial values

Numbers are just digits, we don’t handle fractions:

NUMBER = {digit} ;

Strings are just any printable characters delimited by double quotation marks:

STRING = '"' ; whatever , '"' ;
whatever = ? any_printable_character ? ;

Variables are just identifiers:

variable = IDENT ;

Literals are strings or numbers:

literal = STRING | NUMBER ;

We can modify variable’s value:

setvar = IDENT , '=' , expr , ';' ;

We can allocate new array:

emptyarray = 'NEW' , '[' , NUMBER , ']' ;

We can also allocate array using expressions:

arraydef = '[' list_expr ']'
list_expr = expr , [{ ',' , expr }] ;

We can get or set array’s element using:

arrayelem = IDENT , '[' , expr , ']' ;
setarrayelem = IDENT , '[' , expr , ']' , '=' , expr , ';' ;

Expression can be:

expr = expr , '+' , expr | expr , '-' , expr | expr , '*' , expr | expr , '/' , expr | expr , '%' , expr | proc_call | literal | variable | arrayelem | arraydef | emptyarray ;

We have if:

if_else = 'IF' , '(' , cond_expr , ')' , block_or_statement [ , 'ELSE' , block_or_statement ] ;
cond_expr = expr , '==' , expr | expr , '>=' , expr | expr , '<=' , expr | expr , '>' , expr | expr , '<' , expr | expr , '!=' , expr | '!' , expr ;
block_or_statement = block | statement ;

We have loop:

while = 'WHILE' , '(' , cond_expr , ')' , block_or_statement ;
break = 'BREAK' , ';' ;
continue = 'CONTINUE' , ';' ;

We can print value:

print = 'PRINT' , expr , ';' ;

We can define label — identifier ending with exclamation mark. We can also jump to it:

LABEL = letter_or_underscore , [ { letter_or_underscore_or_digit } , ] '!' ;
jump = 'JUMP' , LABEL , ';' ;

We can declare a function:

function_decl = 'function' , IDENT , '(' , [ list_param , ] ')' , block ;
list_param = [ list_param ',' , ] IDENT ;

We can define its body:

block = '{' , [ list_statement , ] '}' ;
list_statement = { statement } ;
statement = setvar | setarrayelem | if_else | while | proc_call | return | break | continue | print | jump | label ;

We can return value from a function:

return = 'RETURN' , [ expr , ] ';' ;

We can call functions:

proc_call = IDENT , '(' , [ list_expr , ] ')' , ';' ;

Finally, our program is a list of functions or statements:

start = program
program = list_function_statement ;
list_function_statement = [list_function_statement , ] function_decl | [list_function_statement , ] statement };

Summary

OK, we have our grammar. For now it is only for reference, since we will parse it in one of the last parts of this series. Next time we will write some code to represent values in memory and perform basic operations.

]]>
https://blog.adamfurmanek.pl/2016/07/23/yet-another-useless-language-part-2/feed/ 1