PLY – Random IT Utensils https://blog.adamfurmanek.pl IT, operating systems, maths, and more. Wed, 04 Apr 2018 23:17:48 +0000 en-US hourly 1 https://wordpress.org/?v=6.6.2 Yet Another Useless Language Part 10 — Parser https://blog.adamfurmanek.pl/2016/09/17/yet-another-useless-language-part-10/ https://blog.adamfurmanek.pl/2016/09/17/yet-another-useless-language-part-10/#comments Sat, 17 Sep 2016 08:00:16 +0000 https://blog.adamfurmanek.pl/?p=1811 Continue reading Yet Another Useless Language Part 10 — Parser]]>

This is the tenth part of the YAUL series. For your convenience you can find other parts in the table of contents in Part 1 — Introduction

We have our parser almost done, there are just few things left. Let’s begin.

Program structure

For now we focused on parsing separate constructions. Now it is high time to parse the whole application:

def p_start(p):
    """start : program"""
    p[0] = p[1]

def p_program_not_empty(p):
    """program  : list_function_statement"""
    p[0] = Compiler.FunctionStatementBlock()
    p[0].Statements = System.Collections.Generic.List[Compiler.IStatement] ( p[1][0] )
    p[0].Functions = System.Collections.Generic.List[Compiler.FunctionDefinition]( p[1][1] )

def p_program_empty(p):
    """program  : """
    p[0] = Compiler.FunctionStatementBlock()
    p[0].Statements = System.Collections.Generic.List[Compiler.IStatement] ( [] )
    p[0].Functions = System.Collections.Generic.List[Compiler.FunctionDefinition]( [] )

def p_list_function_statement_first_statement(p):
    """list_function_statement : statement"""
    p[0] = [p[1]], []

def p_list_function_statement_first_function(p):
    """list_function_statement : function_decl"""
    p[0] = [], [p[1]]
        
def p_list_function_statement_next_statement(p):
    """list_function_statement : list_function_statement statement"""
    p[0] = p[1]
    p[0][0].append(p[2])

def p_list_function_statement_next_function(p):
    """list_function_statement : list_function_statement function_decl"""
    p[0] = p[1]
    p[0][1].append(p[2])

Our application is simply a list of functions and statements. We go through the source code and collect all of them.

Error handling

We would like to be able to diagnose common parsing errors, like missing semicolon, wrong syntax, etc. This is called parser resynchronization. Here is the code:

# resynchronization
    
def p_block_error(p):
    """block : '{' error '}'"""
    pass

def p_block_error_multiple(p):
    """block : '{' list_statement error '}'"""
    pass

def p_statement_error(p):
     """statement : error ';'"""

def p_function_decl_with_params2(p):
    """function_decl : error FUNCTION IDENT '(' list_param ')' block """
    p[0] = Compiler.FunctionDefinition()
    p[0].Name = p[3]
    p[0].Body = p[7]
    p[0].Parameters = System.Collections.Generic.List[Compiler.Parameter] (p[5])

def p_error(p):
    global errors
    if p:
        errors.append("Line {0:3}:\tSyntax error - unexpected '{1}' ".format(p.lineno, str(p.value)))
    else:
        errors.append("Syntax error - unexpected EOF ")

    if p.type == '}':
        yacc.errok()  # skip additional }

    if p.type == 'FUNCTION':  # resynchronize on function keyword
        return p

This is just a basic error handling. In practice we would like to have better error descriptions, like what was expected in the code, but for now this is sufficient.

Missing stuff

And here is other parser’s missing stuff:

def initialize(plyBasePath):
    global yacc
    global lex
    global sys
    global clr
    global imp
    global parser
    global lexer
    global System
    global Compiler
    global ConstantExpression

    import imp
    import sys
    import clr
    import os

    clr.AddReference("System")
    clr.AddReference("System.Core")
    clr.AddReference("Compiler");
    import System
    import Compiler

    lex = imp.load_source('ply.lex', plyBasePath + '\\lex.py')
    yacc = imp.load_source('ply.yacc',  plyBasePath + '\\yacc.py')
    lexer = lex.lex(module = sys.modules[__name__], debug=1)
    parser = yacc.yacc(module = sys.modules[__name__], debug=1)


def parse(text):
    return parser.parse(text, lexer=lexer)

if __name__ == '__main__':
    
    if len(sys.argv) != 2:
        print("Not valid number of arguments")
        sys.exit(1)

    filename = sys.argv[1]
    file_content = open(filename, "r").read()

    astTree = parse(file_content)

    if errors:
        print "INTERPRETATION FAILED!"
        for error in errors:
            print error
    else:

        try:
            scriptExpression = Compiler.CreateExpressionFromAST(astTree);
            Compiler.ExecuteScript(scriptExpression);
        except Exception as e:
            print "INTERPRETATION FAILED!"
            print e

        
    
    a=raw_input()

Summary

This is it when it comes to PLY’s part. Next time we are going to finish C#’s part and our language will be complete.

]]>
https://blog.adamfurmanek.pl/2016/09/17/yet-another-useless-language-part-10/feed/ 1
Yet Another Useless Language Part 8 — Function https://blog.adamfurmanek.pl/2016/09/03/yet-another-useless-language-part-8/ https://blog.adamfurmanek.pl/2016/09/03/yet-another-useless-language-part-8/#comments Sat, 03 Sep 2016 08:00:27 +0000 https://blog.adamfurmanek.pl/?p=1803 Continue reading Yet Another Useless Language Part 8 — Function]]>

This is the eighth part of the YAUL series. For your convenience you can find other parts in the table of contents in Part 1 — Introduction

Today we are going to handle functions in YAUL. Let’s go.

Introduction

We are going to represent functions as blocks of code. Since we want to handle local variables, we will need to add them to scope and remove after creating the function. We also need to handle names duplicates and conflicts with functions defined in standard library. The code will be little longer today.

Parser

Let’s begin with PLY:

def p_function_decl_with_params(p):
    """function_decl : FUNCTION IDENT '(' list_param ')' block """
    p[0] = Compiler.FunctionDefinition()
    p[0].Name = p[2]
    p[0].Body = p[6]
    p[0].Parameters = System.Collections.Generic.List[Compiler.Parameter] (p[4])

def p_function_decl_with_params2(p):
    """function_decl : error FUNCTION IDENT '(' list_param ')' block """
    p[0] = Compiler.FunctionDefinition()
    p[0].Name = p[3]
    p[0].Body = p[7]
    p[0].Parameters = System.Collections.Generic.List[Compiler.Parameter] (p[5])
Comp
def p_function_decl_without_params(p):
    """function_decl : FUNCTION IDENT '(' ')' block """
    p[0] = Compiler.FunctionDefinition()
    p[0].Name = p[2]
    p[0].Body = p[5]
    p[0].Parameters = System.Collections.Generic.List[Compiler.Parameter] ( [] )

It looks very similar to code for loops. One additional thing is list of parameters:

def p_list_param_first(p):
    """list_param : IDENT """
    param = Compiler.Parameter()
    param.Name = p[1]
    p[0] = [param]    

def p_list_param_next(p):
    """list_param : list_param ',' IDENT """
    p[0] = p[1]
    param = Compiler.Parameter()
    param.Name = p[3]
    p[0].append(param)

However, there is a very special keyword for functions: return. We can return value or just exit the function:

def p_return_void(p):
    """return : RETURN ';' """
    p[0] = Compiler.Compiler.Return()

def p_return_value(p):
    """return : RETURN expr ';' """
    p[0] = Compiler.Return()
    p[0].ReturnedValue = p[2]

What’s more, we would like to be able to call our functions:

def p_proc_call(p):
    """proc_call : funct_call ';' """
    p[0] = Compiler.ProdecureCall()
    p[0].FunctionCall = p[1]

def p_funct_call_normal_none_params(p):
    """funct_call : IDENT '(' ')'"""
    p[0] = Compiler.FunctionCall()
    p[0].Name = p[1]
    p[0].Arguments = System.Collections.Generic.List[Compiler.IExpression]()

def p_funct_call_normal_with_params(p):
    """funct_call : IDENT '(' list_expr ')'"""
    p[0] = Compiler.FunctionCall()
    p[0].Name = p[1]
    p[0].Arguments = System.Collections.Generic.List[Compiler.IExpression](p[3])

This is it, nothing tricky here.

C#

We start with function definition:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Linq.Expressions;

namespace Compiler
{
    public class FunctionDefinition : ISemanticObject
    {
        public StatementBlock Body { get; set; }
        public List< Parameter> Parameters { get; set; }

        public string Name { get; set; }

        public Expression Accept(IVisitor visitor)
        {
            var parameterExpressions = CreateParamsExpression(visitor);
            AddParamsToScope(visitor, parameterExpressions);
            
            DefineReturnLabel(visitor);

            var bodyExpression = Body.Accept(visitor);

            RemoveParamsFromScope(visitor, parameterExpressions);

            var usedVariables = GetUsedVariables(visitor);

            var newBodyExpression = DecorateWithReturnLabel(visitor, bodyExpression, usedVariables);

            var functionExpression = CreateFunctionExpression(visitor, newBodyExpression, parameterExpressions);

            visitor.LocalVariables.Clear();
            
            return functionExpression;
        }

        private BinaryExpression CreateFunctionExpression(IVisitor visitor, BlockExpression newBodyExpression,
                                                          IEnumerable< ParameterExpression> parameterExpressions)
        {
            var methodVar = (ParameterExpression) visitor.Functions.Single(func => func.Name.Equals(Name)).Function;
            var functionExpression = Expression.Assign(methodVar,
                                                       Expression.Lambda(newBodyExpression, Name, parameterExpressions));
            return functionExpression;
        }

        private static BlockExpression DecorateWithReturnLabel(IVisitor visitor, Expression bodyExpression, IEnumerable< ParameterExpression> usedVariables)
        {
            Expression defaultValue = Expression.Call(typeof (YaulCompiler), "ConstructSimpleObject", null,
                                                      new Expression[] {Expression.Constant("", typeof (string))});
            var labelExpression = Expression.Label(visitor.ReturnLabel, defaultValue);

            var bodyAndLabel = new List< Expression> {bodyExpression, labelExpression};            
            return Expression.Block(usedVariables, bodyAndLabel);
        }

        private static IEnumerable< ParameterExpression> GetUsedVariables(IVisitor visitor)
        {
            return visitor.LocalVariables.Select(variable => variable.VariableReference);
        }

        private static void RemoveParamsFromScope(IVisitor visitor, List< ParameterExpression> parameterExpressions)
        {
            for(int i = visitor.LocalVariables.Count - 1; i >= 0; --i)
            {
                if (parameterExpressions.Contains(visitor.LocalVariables[i].VariableReference))
                {
                    visitor.LocalVariables.RemoveAt(i);
                }
            }
        }

        private static void DefineReturnLabel(IVisitor visitor)
        {
            visitor.ReturnLabel = Expression.Label(typeof(SimpleObject));
        }

        private void AddParamsToScope(IVisitor visitor, IEnumerable< ParameterExpression> parameterExpressions)
        {
            var variables = parameterExpressions.Select(CreateVariableFromParam);
            foreach(var variable in variables)
            {
                visitor.LocalVariables.Add(variable);
            }
            
        }

        private Variable CreateVariableFromParam(ParameterExpression parameterExpression)
        {
            return new Variable
                {
                    Name = parameterExpression.Name,
                    VariableReference = parameterExpression
                };
        }

        private List< ParameterExpression> CreateParamsExpression(IVisitor visitor)
        {
            return Parameters.Select(parameter => (ParameterExpression) parameter.Accept(visitor)).ToList();
        }

        public void FindLabels(IVisitor visitor)
        {
            
        }

        public PreparedFunction Prepare()
        {
            var type = DetermineType();
            var variable = Expression.Variable(type, Name);

            return new PreparedFunction
                {
                    Function = variable,
                    Name = Name
                };

        }

        private Type DetermineType()
        {
            var parameterExpressions = Parameters.Select(parameter => (ParameterExpression) parameter.Accept(null)).ToList();
            
            LabelTarget returnLabel = Expression.Label(typeof (SimpleObject));
            Expression defaultValue = Expression.Call(typeof (YaulCompiler), "ConstructSimpleObject", null,
                                                      new Expression[] {Expression.Constant("", typeof (string))});

            var simpleBody = Expression.Label(returnLabel, defaultValue);

            var declaration = Expression.Lambda(simpleBody, Name, parameterExpressions);

            return declaration.Type;
        }
    }
}

Let’s examin Accept function. First, we create expressions for parameters. Next, we add them to local scope so they can be used as local variables. Next, we visit body. Since we have local variables in scope, body is free to use them. When it is done, we remove parameters from scope. Next, we create block for function, create expression holding our function, clear local parameters, and we are done.

PreparedFunction looks as follows:

using System.Linq.Expressions;

namespace Compiler
{
    public class PreparedFunction
    {
        public Expression Function { get; set; }
        public string Name { get; set; }
    }
}

Function’s parameters are defined as:

using System.Linq.Expressions;

namespace Compiler
{
    public class Parameter : ISemanticObject
    {
        public string Name { get; set; }
        public Expression Accept(IVisitor visitor)
        {
            return Expression.Parameter(typeof (SimpleObject), Name);
        }

        public void FindLabels(IVisitor visitor)
        {
        }
    }
}

And return instruction:

using System.Linq.Expressions;

namespace Compiler
{
    public class Return : IStatement
    {
        public IExpression ReturnedValue { get; set; }

        public Expression Accept(IVisitor visitor)
        {
            if (ReturnedValue != null)
            {
                var value = ReturnedValue.Accept(visitor);
                return Expression.Return(visitor.ReturnLabel, value, typeof(SimpleObject));
            }
            Expression defaultValue = Expression.Call(typeof (YaulCompiler), "ConstructSimpleObject", null, new Expression[] { Expression.Constant("", typeof(string)) });
            return Expression.Return(visitor.ReturnLabel, defaultValue, typeof(SimpleObject));
        }

        public void FindLabels(IVisitor visitor)
        {
        }
    }
}

Please note, that we always return something, but in case when we want to exit the function, we simply ignore the value.

Anod now the last part, calling functions:

using System.Linq.Expressions;

namespace Compiler
{
    public class ProdecureCall : IStatement
    {
        public FunctionCall FunctionCall;

        public Expression Accept(IVisitor visitor)
        {
            return FunctionCall.Accept(visitor);
        }

        public void FindLabels(IVisitor visitor)
        {
        }
    }
}

using System.Collections.Generic;
using System.Linq;
using System.Linq.Expressions;

namespace Compiler
{
    public class FunctionCall : IExpression
    {
        public string Name { get; set; }
        public IList< IExpression> Arguments { get; set; }

        public Expression Accept(IVisitor visitor)
        {
            var function = visitor.Functions.SingleOrDefault(func => func.Name.Equals(Name));
            if (function != null)
            {
                var argumentsExpression = Arguments.Select(argument => argument.Accept(visitor));
                var lambda = function;
                return Expression.Invoke(lambda.Function, argumentsExpression);
            }

            throw new MethodNotFoundException(Name);
        }

        public void FindLabels(IVisitor visitor)
        {
        }
    }
}

Summary

Function handling is the most difficult part in our language, however, there is no magic. Wee simply need to write a little more code to correctly handle and create function’s block. Next time we are going to handle standard library.

]]>
https://blog.adamfurmanek.pl/2016/09/03/yet-another-useless-language-part-8/feed/ 1
Yet Another Useless Language Part 7 — Loop https://blog.adamfurmanek.pl/2016/08/27/yet-another-useless-language-part-7/ https://blog.adamfurmanek.pl/2016/08/27/yet-another-useless-language-part-7/#comments Sat, 27 Aug 2016 08:00:46 +0000 https://blog.adamfurmanek.pl/?p=1798 Continue reading Yet Another Useless Language Part 7 — Loop]]>

This is the seventh part of the YAUL series. For your convenience you can find other parts in the table of contents in Part 1 — Introduction

Last time we implemented conditional expression. Today we are going to implement another basic imperative concept — loop.

Introduction

We are going to handle while loop. We will also handle common keywords allowing to control execution: break, continue, and goto.

Parser

As usual, we start with parser. First, loop:

def p_while(p):
    """while : WHILE '(' cond_expr ')' block_or_statement"""
    p[0] = Compiler.WhileLoop()
    p[0].Condition = p[3]
    p[0].Body = p[5]

Looks almost the same as if code. We simply check condition and execute block of code. Let’s see what is exactly the block_or_statement:

def p_block_or_statement_block(p):
    """block_or_statement : block"""
    p[0] = p[1]

def p_block_or_statement_statement(p):
    """block_or_statement : statement"""
    p[0] = p[1]

def p_block_empty(p):
    """block : '{' '}' """
    p[0] = Compiler.StatementBlock()
    p[0].Statements = System.Collections.Generic.List[Compiler.IStatement]([])

def p_block_not_empty(p):
    """block : '{' list_statement '}' """
    p[0] = Compiler.StatementBlock()
    p[0].Statements = System.Collections.Generic.List[Compiler.IStatement](p[2])

def p_list_statement_first(p):
    """list_statement : statement"""
    p[0] = [p[1]]
    

def p_list_statement_next(p):
    """list_statement : list_statement statement"""
    p[0] = p[1]
    p[0].append(p[2])
    

def p_statement(p):
    """statement : setvar
                 | setarrayelem
                 | if_else
                 | while
                 | proc_call
                 | return
                 | break
                 | continue
                 | print 
                 | jump
                 | label """
    p[0] = p[1]

The code should be obvious. We simply handle list of statements, where each statement is one of language’s instructions. Let’s now handle keywords for controlling loop:

def p_break(p):
    """break : BREAK ';' """
    p[0] = Compiler.Break()

def p_continue(p):
    """continue : CONTINUE ';' """
    p[0] = Compiler.Continue()

Looks pretty easy, just a keyword with semicolon at the end. Let’s now see the labels and goto:

def t_LABEL(t):
    r'[a-zA-Z_][a-zA-Z0-9_]*!'
    t.value = t.value.strip('!')
    return t

def p_label(p):
    """ label : LABEL"""
    p[0] = Compiler.Label()
    p[0].Name = p[1]

def p_jump(p):
    """ jump : JUMP LABEL ';' """
    p[0] = Compiler.Jump()
    p[0].Target = Compiler.Label();
    p[0].Target.Name = p[2]

Once again, nothing special.

C#

Let’s dig into C# code. First, loop:

using System;
using System.Linq.Expressions;

namespace Compiler
{
    public class WhileLoop : IStatement
    {
        public IExpression Condition { get; set; }
        public StatementBlock Body { get; set; }

        public WhileLoop()
        {
            Body = new StatementBlock();
        }

        public Expression Accept(IVisitor visitor)
        {
            LabelTarget breakLabel = Expression.Label();
            LabelTarget continueLabel = Expression.Label();
            Expression breakExpression = Expression.Break(breakLabel);
            Expression continueExpression = Expression.Continue(continueLabel);

            visitor.BreakExpressions.Add(breakExpression);
            visitor.ContinueExpressions.Add(continueExpression);

            var result = Expression.Loop(
                Expression.IfThenElse(
                    Condition.Accept(visitor), 
                    Expression.Block(
                        Body.Accept(visitor),
                        continueExpression
                    ),
                    breakExpression
                ), 
                breakLabel, 
                continueLabel);


            visitor.BreakExpressions.Remove(breakExpression);
            visitor.ContinueExpressions.Remove(continueExpression);

            return result;
        }

        public void FindLabels(IVisitor visitor)
        {
            Body.FindLabels(visitor);
        }
    }
}

First, before visiting the body we create instructions for break and continue instructions. We add them to visitor, so it knows which instructions use to control the loop. Next, we create a loop — it consists of expression for checking condition, expression for loop’s block, and expressions for controlling the loop. Check MSDN for more details. We also need to handle labels in loop’s body.

Let’s now see blocks of code:

using System.Collections.Generic;
using System.Linq;
using System.Linq.Expressions;

namespace Compiler
{
    public class StatementBlock : ISemanticObject
    {
        public List< IStatement> Statements { get; set; }
        
        public Expression Accept(IVisitor visitor)
        {
            FindLabels(visitor);
            var generatedStatements = GenerateContent(visitor);
            
            if (generatedStatements.Count > 0)
            {
                return Expression.Block(generatedStatements);
            }

            return Expression.Empty();
        }

        private IList< Expression> GenerateContent(IVisitor visitor)
        {
            return Statements.Select(statement => statement.Accept(visitor)).Where(statement => statement != null).ToList();
        }

        public void FindLabels(IVisitor visitor)
        {
            foreach (var each in Statements)
            {
                each.FindLabels(visitor);
            }
        }
    }
}

No magic here. We simply iterate over statements and collect them into one block. Statement is:

namespace Compiler
{
    public interface IStatement : ISemanticObject
    {
        
    }
}

Well, it is very simple. Statements are just particular instructions and they are handled somewhere else, in implementations.

OK, let’s now see control keywords:

using System;
using System.Linq;
using System.Linq.Expressions;

namespace Compiler
{
    public class Break : IStatement
    {
        public Expression Accept(IVisitor visitor)
        {
            if (visitor.BreakExpressions.Count == 0)
            {
                throw new InvalidOperationException("Cannot break - must be inside loop");
            }

            return visitor.BreakExpressions.Last();
        }

        public void FindLabels(IVisitor visitor)
        {
        }
    }
}

using System;
using System.Linq;
using System.Linq.Expressions;

namespace Compiler
{
    public class Continue : IStatement
    {
        public Expression Accept(IVisitor visitor)
        {
            if (visitor.ContinueExpressions.Count == 0)
            {
                throw new InvalidOperationException("Cannot continue - must be inside loop");
            }

            return visitor.ContinueExpressions.Last();
        }

        public void FindLabels(IVisitor visitor)
        {
        }
    }
}

We simply extract most recent instructions from visitor or throw exceptions if they do not exist.

And now the last part, gotos:

using System.Linq.Expressions;

namespace Compiler
{
    public class Label : IStatement
    {
        public string Name { get; set; }
        public LabelTarget Target { get; set; }

        public Expression Accept(IVisitor visitor)
        {
            return Expression.Label(Target);
        }

        public void FindLabels(IVisitor visitor)
        {
            Target = Expression.Label(Name);
            visitor.Labels.Add(Target);
        }
    }
}

using System.Linq;
using System.Linq.Expressions;

namespace Compiler
{
    public class Jump : IStatement
    {
        public Label Target { get; set; }

        public Expression Accept(IVisitor visitor)
        {
            return Expression.Goto(visitor.Labels.First(label => label.Name == Target.Name));
        }

        public void FindLabels(IVisitor visitor)
        {
        }
    }
}

Since LINQ handles labels and jumps, this is pretty straightforward. We simply create label, store it in visitor, and jump to it.

Summary

We are now able to handle loops. Next time we will handle another big thing: functions.

]]>
https://blog.adamfurmanek.pl/2016/08/27/yet-another-useless-language-part-7/feed/ 1
Yet Another Useless Language Part 6 — If https://blog.adamfurmanek.pl/2016/08/20/yet-another-useless-language-part-6/ https://blog.adamfurmanek.pl/2016/08/20/yet-another-useless-language-part-6/#comments Sat, 20 Aug 2016 08:00:10 +0000 https://blog.adamfurmanek.pl/?p=1794 Continue reading Yet Another Useless Language Part 6 — If]]>

This is the sixth part of the YAUL series. For your convenience you can find other parts in the table of contents in Part 1 — Introduction

We continue developing custom language on .NET platform using LINQ expressions. Today we are going to implement conditional operator.

Grammar

We start with PLY grammar. Code for if is pretty simple:

def p_if_else_with_else(p):
    """if_else : IF '(' cond_expr ')' block_or_statement ELSE block_or_statement """
    p[0] = Compiler.If()
    p[0].Condition = p[3]
    p[0].TrueBlock = p[5]
    p[0].FalseBlock = p[7]

def p_if_else_without_else(p):
    """if_else : IF '(' cond_expr ')' block_or_statement"""
    p[0] = Compiler.If()
    p[0].Condition = p[3]
    p[0].TrueBlock = p[5]

We define if as keyword followed by condition in parenthesis, and statements. We need to handle matching else block. This part is a common source of ambiguity in parsers — imagine the following code:

if (something)
if (something2)
    print("Conditions met")
else
    print ("Condition not met")

Code is poorly formatted deliberately. We have two ifs and only one else — the question is: is the else clause matching first or second if? This problem occurs in most of today’s languages and is usually solved by assumption that the else matches closes if. So we should parse this code as:

if (something) {
    if (something2)
        print("Conditions met")
    else
        print ("Condition not met")
}

and not as:

if (something) {
    if (something2)
        print("Conditions met")
} else {
    print ("Condition not met")
}

In grammar’s terms this is known as reduce-reduce conflict.

Let’s move on. We need to parse conditions:

def p_cond_expr(p):
    """cond_expr : expr EQEQ expr
                 | expr GTEQ expr
                 | expr LSEQ expr
                 | expr NTEQ expr
                 | expr '>' expr
                 | expr '<' expr"""
    p[0] = Compiler.BinaryOperation()
    p[0].Left = p[1]
    p[0].Right = p[3]
    p[0].Sign = p[2]

This part is pretty easy. We handle common relational operators making a binary operation.

C# part

Binary operation was described in previous part. Now code for if:

using System.Linq.Expressions;

namespace Compiler
{
    public class If : IStatement
    {
        public IExpression Condition { get; set; }
        public StatementBlock TrueBlock { get; set; }
        public StatementBlock FalseBlock { get; set; }

        public If()
        {
            TrueBlock = new StatementBlock();
            FalseBlock = new StatementBlock();
        }

        public Expression Accept(IVisitor visitor)
        {
            if (FalseBlock.Statements != null)
            {
                return Expression.IfThenElse(Condition.Accept(visitor), TrueBlock.Accept(visitor),
                                             FalseBlock.Accept(visitor));
            }
            return Expression.IfThen(Condition.Accept(visitor), TrueBlock.Accept(visitor));
        }

        public void FindLabels(IVisitor visitor)
        {
            TrueBlock.FindLabels(visitor);
            if (FalseBlock.Statements != null)
            {
                FalseBlock.FindLabels(visitor);
            }
        }
    }
}

Nothing difficult here. We have only two cases to handle, if with matching else or without. We also need to recursively find labels in blocks.

Summary

We are now able to handle conditions in YAUL. Next time we will examine basic loop concept.

]]>
https://blog.adamfurmanek.pl/2016/08/20/yet-another-useless-language-part-6/feed/ 1
Yet Another Useless Language Part 5 — Variables https://blog.adamfurmanek.pl/2016/08/13/yet-another-useless-language-part-5/ https://blog.adamfurmanek.pl/2016/08/13/yet-another-useless-language-part-5/#comments Sat, 13 Aug 2016 08:00:58 +0000 https://blog.adamfurmanek.pl/?p=1781 Continue reading Yet Another Useless Language Part 5 — Variables]]>

This is the fifth part of the YAUL series. For your convenience you can find other parts in the table of contents in Part 1 — Introduction

Hi! Today we are going to add variables support to our language.

Introduction

In previous parts we saw implementation of variable type for storing values and performing operations. We also saw the grammar for YAUL which allows us to declare variables, create arrays, and perform operations on values. We do not need to declare variables explicitly, however, we need to assign them any value before they are used in subsequent operations.

Before we dig into C# code, let’s see the Python code for parsing operations.

PLY

First, we start with defining lexems.

Tokens

Before we do anything in our language, let’s define symbols, operator precedence, and compound operators:

literals = [':', ';', ',', '(', ')', '[', ']', '{', '}', 
            '=', '*', '+', '-', '/', '%', '>', '<', '!']

precedence = (
        ("left", '+', '-'),
        ("left", '*', '/', '%')   
    )

First, we define all allowed characters in our source code. Any other character will be ignored silently. Next, we define operators’ precedence. We will use it when it comes to operations on variables.

Next, let’s define keywords and operators:

keywords = {
    'print'     : 'PRINT',
    'function'  : 'FUNCTION',
    'if'        : 'IF',
    'else'      : 'ELSE',
    'while'     : 'WHILE',
    'return'    : 'RETURN',
    'new'       : 'NEW',
    'continue'  : 'CONTINUE',
    'break'     : 'BREAK',
    'jump'      : 'JUMP',
}

tokens = ['EQEQ',
          'GTEQ',
          'LSEQ',
          'NUMBER',
          'STRING',
          'IDENT',
          'LABEL'
          ] + list(keywords.values())

t_EQEQ = r"=="
t_GTEQ = r">="
t_LSEQ = r"<="

First, we define a list of keywords. You can see that we have keyword for all common constructions, we also have custom keyword for printing variables. We also define how to match operators consisting of two characters.

Let’s also ignore comments in the source code:

t_ignore_LINE_COMMENT = r'//.*'
t_ignore_BLOCK_COMMENT = r'/\*((.|\n)*?)\*/'
t_ignore = ' \t'

We should also handle line numbers and errors in order to present better error descriptions:

def t_newline(t):
    r'\n+'
    t.lexer.lineno += len(t.value)
	
errors = []

def t_error(t):
    global errors
    errors.append("Line {0:3}:\tIllegal character '{1}'".format(t.lexer.lineno, t.value[0]))
    t.lexer.skip(1)

Literals

We have only two primitives in our language: numbers (integers) and strings. So let’s handle them with PLY:

def t_STRING(t):
    r'".*?"'
    t.value = t.value.strip(r'"');
    return t

def t_NUMBER(t):
    r'\d+'
    t.value = int(t.value)
    return t

String is basically anything delimited with double quotation marks. Notice that we have non-greedy match in order to not catch too many characters. Numbers are just bunch of digits, nothing more. We do not handle real numbers.

OK, we can handle literals.

Variables

Let’s now handle identifiers for variables:

def t_IDENT(t):
    r'[a-zA-Z_][a-zA-Z0-9_]*'
    t.type = keywords.get(t.value,'IDENT')
    return t

Identifier is anything starting with letter or underscore, followed by any number of letters, digits, and underscores. We parse the lexem and store its type.

OK, we are able to match literals and identifiers. Let’s write code to store value in the variable:

def p_setvar_assign_variable(p):
    """setvar : IDENT '=' expr ';'"""
    p[0] = Compiler.Assignment()
    p[0].VariableName = p[1]
    p[0].ValueExpression = p[3]

Here we match expressions in form variable = literal. We can match any expression as right side of assignment, and store data in Assignment object to utilize it later. Expression is allowed to have the following form:

def p_expr_arithmetic(p):
    """expr : expr '+' expr
            | expr '-' expr
            | expr '*' expr
            | expr '/' expr
            | expr '%' expr"""
    p[0] = Compiler.BinaryOperation()
    p[0].Left = p[1]
    p[0].Right = p[3]
    p[0].Sign = p[2]

def p_expr_others(p):
    """expr : funct_call
            | literal
            | variable
            | arrayelem
            | arraydef
            | emptyarray"""

    p[0] = p[1]

First, we handle operators like addition or multiplication. We parse them and store as BinaryOperation which we will examine in next parts of this series. We can also handle different things as expressions: function calls, literals, other variables, array elements, array definitions, and empty arrays. We will cover most of them in next parts. For now let’s focus on few of them.

First, literals. We assume that literal is string or number:

def p_literal_string(p):
    """literal : STRING
               | NUMBER """
    p[0] = Compiler.ConstantExpression()
    p[0].Value = p[1]

Since we will handle types in C# code, we do not need to handle them explicitly, we just store the value.

Next, let’s handle assigning one variable to another:

def p_variable(p):
    """variable : IDENT """
    p[0] = Compiler.VariableDereference()
    p[0].Name = p[1]

We treat variable as VariableDereference. In the C# counterpart we will need to check whether the variable is already defined. If it is so — we can perform assignment.

Arrays

When it comes to arrays, we can create empty array with specified size:

def p_emptyarray(p):
    """ emptyarray : NEW '[' NUMBER ']' """
    p[0] = Compiler.ConstantArray()
    p[0].Value = Array[Compiler.SimpleObject]([Compiler.SimpleObject(0) for n in range(p[3])])

We create new array and initialize its elements to zeros. We can also define array as list of expressions:

def p_arraydef(p):
    """ arraydef : '[' list_expr ']' """
    p[0] = Compiler.ConstantArray()
    p[0].Value = Array[Compiler.SimpleObject]([Compiler.SimpleObject(n) for n in p[2]])

def p_list_expr_first(p):
    """list_expr : expr"""
    p[0] = [p[1]]

def p_list_expr_next(p):
    """list_expr : list_expr ',' expr"""
    p[0] = p[1]
    p[0].append(p[3])

We match two brackets and extract all values separated with commas. We can also extract element from array:

def p_arrayelem(p):
    """ arrayelem : IDENT '[' expr ']' """
    p[0] = Compiler.ArrayAccess()
    p[0].Name = p[1]
    p[0].Index = p[3]

And we can assign value to array’s element:

def p_setarrayelem_variable(p):
    """setarrayelem : IDENT '[' expr ']' '=' expr ';'"""
    p[0] = Compiler.ArrayAssignment()
    p[0].VariableName = p[1]
    p[0].ValueExpression = p[6]
    p[0].Index = p[3]

That’s it when it comes to PLY code. Most of it is rather straightforward, we simply extract tokens from source code and store them in custom classes in order to handle them later. Nothing fancy here.

C#

Now it is time to write C# code to handle values.

Basic variables assignment

Let’s start with assigning variables:

using System.Linq;
using System.Linq.Expressions;

namespace Compiler
{
    public class Assignment : IStatement
    {
        public string VariableName { get; set; }
        public IExpression ValueExpression { get; set; }

        public Expression Accept(IVisitor visitor)
        {
            Variable dereferencedVariable = visitor.LocalVariables.SingleOrDefault(variable => variable.Name.Equals(VariableName));
            if (dereferencedVariable == null)
            {
                dereferencedVariable = new Variable {
                    Name = VariableName,
                    VariableReference = Expression.Variable(typeof (SimpleObject), VariableName)
                };

                visitor.LocalVariables.Add(dereferencedVariable);
            }

            return Expression.Assign(dereferencedVariable.VariableReference, ValueExpression.Accept(visitor));
        }

        public void FindLabels(IVisitor visitor)
        {
        }
    }
}

As we saw in previous part, we store all variables in visitor object. First, we try to find variable by name — if there is no such a variable, we simply create it in lines 16-19. Please also notice Expression.Variable in line 18 — this is the code which creates actual variable using lambdas. This lambda will be translated into ordinary variable definition of type SimpleObject with specified name, so this will be something like: SimpleObject name;.

Next, in line 24 we assign value to variable. We pass our visitor to ValueExpression, so the value will be calculated in runtime (if it is needed).

Assignment defines no labels, so FindLables simply does nothing. Please remember, that we use labels to perform gotos, we will handle them in next parts.

To sump up: this C# code creates new variable if it needed and assigns value to it. Variable is created in two places: one place is our local list of variables, so we can track them and handle them correctly, second place is lambda. The latter place is the place where all magic happens and the variable is actually created. Assignment is handled in only one place and is performed in runtime — whether it is translated to assigning constant or function call is up to ValueExpression.Accept result.

Please also notice, that we do not have scopes. All variables are stored in one collection, we do not verify whether these variables are properly scoped. We could do it here, but let’s not bother with it for now.

OK, let’s move on. We can assign constants to variables, so let’s see the C# code:

using System.Linq.Expressions;

namespace Compiler
{
    public class ConstantExpression : IExpression
    {
        public object Value { get; set; }

        public Expression Accept(IVisitor visitor)
        {
            System.Linq.Expressions.ConstantExpression param = Expression.Constant(Value, Value.GetType());
            return Expression.Call(typeof (YaulCompiler), "ConstructSimpleObject", null, new Expression[] { param });
        }

        public void FindLabels(IVisitor visitor)
        {
        }
    }
}

No magic here — since we have parsed value from PLY, we can simply create a lambda representing constant (line 11) with correct type (determined at runtime by examining the value, it is not done by the parser, however, it could be). Next, we return a lambda representing call to static function of YaulCompiler which simply creates new object with value and returns it. We could replace this lambda will direct call to SimpleObject constructor, however, using helper function is better, since we have exactly one place of variables creation.

Let’s now see how we can assign one variable to another. In order to do that, we need to dereference existing variable. Here is the code:

using System.Linq;
using System.Linq.Expressions;

namespace Compiler
{
    public class VariableDereference : IExpression
    {
        public string Name { get; set; }
        
        public Expression Accept(IVisitor visitor)
        {
            var variable = visitor.LocalVariables.FirstOrDefault(v => v.Name.Equals(Name));
            if (variable != null)
            {
                return variable.VariableReference;
            }
            throw new VariableNotInitializedException(Name);
        }

        public void FindLabels(IVisitor visitor)
        {
        }
    }
}

First, we traverse list of variables and try to find variable matching by name. If we found one, we return it — in other case we throw an exception describing the problem.

So now we are able to assign literal to variable and variable to variable. Let’s handle some more sophisticated scenarios.

Arrays

Let’s start with empty array:

using System.Linq.Expressions;

namespace Compiler
{
    public class ConstantArray : IExpression
    {
        public object Value { get; set; }

        public Expression Accept(IVisitor visitor)
        {
            return Expression.Constant(new SimpleObject(Value));
        }

        public void FindLabels(IVisitor visitor)
        {
        } 
    }
}

We created an actual array of values in Python code, so now we only need to create a constant representing the object. Please notice, that Expression.Constant means a predefined value, not a constant like const int. We don’t care whether this array is with predefined size or created from expression list — we have all the values provided by PLY and we can create the array.

When it comes to accessing array elements, we have the following code:

using System.Linq;
using System.Linq.Expressions;

namespace Compiler
{
    public class ArrayAccess : IExpression
    {
        public string Name { get; set; }
        public IExpression Index { get; set; }

        public Expression Accept(IVisitor visitor)
        {
            var argument = Index.Accept(visitor);
            return Expression.Call(typeof (SimpleObject), "GetElement", null, new Expression[]{
                visitor.LocalVariables.Single(variable => variable.Name.Equals(Name))
                        .VariableReference,
                argument
            });
        }

        public void FindLabels(IVisitor visitor)
        {
        }
    }
}

In line 13 we extract the index of element. Since the index might be a value calculated in runtime, we simply transform it to lambdas and call helper method from SimpleObject (line 14). Please also notice that we don’t check whether we have variable defined or not — in the latter situation we will throw an exception. Here you can see why it is a good idea to have custom exception types — getting NullReferenceException explains much less than custom type.

Let’s now see the code for changing array’s element:

using System.Linq;
using System.Linq.Expressions;

namespace Compiler
{
    public class ArrayAssignment : IStatement
    {
        public string VariableName { get; set; }
        public IExpression ValueExpression { get; set; }
        public IExpression Index { get; set; }

        public Expression Accept(IVisitor visitor)
        {
            Variable dereferencedVariable = visitor.LocalVariables.FirstOrDefault(variable => variable.Name.Equals(VariableName));

            if (dereferencedVariable == null)
            {
                throw new VariableNotInitializedException(VariableName);
            }

            return Expression.Call(typeof(SimpleObject), "SetElement", null, new Expression[] {
                dereferencedVariable.VariableReference,
                Index.Accept(visitor),
                ValueExpression.Accept(visitor)
            });
        }

        public void FindLabels(IVisitor visitor)
        {
        }
    }
}

First, we look for variable. If we can’t find it — we throw. Next, we call helper method SetElement and pass all required arguments — array variable, array index, and value for element.

Arithmetic

There is one more thing which we can cover today — binary operations. In PLY we defined a function for parsing binary operations like addition, here is C# code for performing actual calculations:

using System;
using System.Linq.Expressions;

namespace Compiler
{
    public class BinaryOperation : IExpression
    {
        public IExpression Left { get; set; }
        public IExpression Right { get; set; }
        public string Sign { get; set; }

        public Expression Accept(IVisitor visitor)
        {
            var leftSide = Left.Accept(visitor);
            var rightSide = Right.Accept(visitor);

            switch (Sign)
            {
                case "+":
                    return Expression.Add(leftSide, rightSide);
                case "-":
                    return Expression.Subtract(leftSide, rightSide);
                case "*":
                    return Expression.Multiply(leftSide, rightSide);
                case "/":
                    return Expression.Divide(leftSide, rightSide);
                case "%":
                    return Expression.Modulo(leftSide, rightSide);
                case "<":
                    return Expression.LessThan(leftSide, rightSide);
                case "<=":
                    return Expression.LessThanOrEqual(leftSide, rightSide);
                case ">":
                    return Expression.GreaterThan(leftSide, rightSide);
                case ">=":
                    return Expression.GreaterThanOrEqual(leftSide, rightSide);
                case "==":
                    return Expression.Equal(leftSide, rightSide);
                case "!=":
                    return Expression.NotEqual(leftSide, rightSide);
                default:
                    throw new InvalidOperationException(string.Format("Incorrect operation sign: {0}", Sign));
            }
        }

        public void FindLabels(IVisitor visitor)
        {
        }
    }
}

A bit more code than in other snippets, however, there is nothing fancy here. We simply try to match operator and create correct lambda for performing the operation. You can also notice, that we handle much more operators than in PLY (namely, comparison operators) — we will explain them in next parts.

Summary

We are now able to assign values to variables. In next part we are going to examine if construct.

]]>
https://blog.adamfurmanek.pl/2016/08/13/yet-another-useless-language-part-5/feed/ 1
Yet Another Useless Language Part 2 — Grammar https://blog.adamfurmanek.pl/2016/07/23/yet-another-useless-language-part-2/ https://blog.adamfurmanek.pl/2016/07/23/yet-another-useless-language-part-2/#comments Sat, 23 Jul 2016 08:00:11 +0000 https://blog.adamfurmanek.pl/?p=1765 Continue reading Yet Another Useless Language Part 2 — Grammar]]>

This is the second part of the YAUL series. For your convenience you can find other parts in the table of contents in Part 1 — Introduction

Last time we described features of a language we are going to write. Today we are going to define its grammar using EBNF-like notation.

Notation

We will describe the notation using the following syntax:

  • Elements can be written in any case
  • Optional elements are written in square brackets: [Optional]
  • One or more elements are written in curly brackets: {one_or_more}
  • Literals are written in quotation marks: 'literal'
  • Literals are case insensitive
  • Question marks indicates parts described in natural language

Identifiers

We start with defining identifiers. We will handle only latin letters, digits, and underscores. Variable’s name will need to start with letter or underscore:

IDENT = letter_or_underscore , [ { letter_or_underscore_or_digit } ] ;
letter_or_underscore = letter | underscore ;
letter_or_underscore_or_digit = letter_or_underscore | digit;
underscore = '_' ;
digit = '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' ;
letter = 'a' | 'b' | 'c' | 'd' | 'e' | 'f' | 'g' | 'h' | 'I' | 'j' | 'k' | 'l' | 'm' | 'n' | 'o' | 'p' | 'q' | 'r' | 's' | 't' | 'u' | 'v' | 'w' | 'x' | 'y' | 'z' | 'A' | 'B' | 'C' | 'D' | 'E' | 'F' | 'G' | 'H' | 'I' | 'J' | 'K' | 'L' | ' M' | 'N' | 'O' | 'P' |' Q' | 'R '| 'S' | 'T' | 'U' | 'V' | 'W' | 'X' | 'Y' | 'Z' ;

We can see that IDENT is a letter_or_underscore, optionally followed by letters, digits, or underscores. We also specified all possible letters and digits we can handle.

Trivial values

Numbers are just digits, we don’t handle fractions:

NUMBER = {digit} ;

Strings are just any printable characters delimited by double quotation marks:

STRING = '"' ; whatever , '"' ;
whatever = ? any_printable_character ? ;

Variables are just identifiers:

variable = IDENT ;

Literals are strings or numbers:

literal = STRING | NUMBER ;

We can modify variable’s value:

setvar = IDENT , '=' , expr , ';' ;

We can allocate new array:

emptyarray = 'NEW' , '[' , NUMBER , ']' ;

We can also allocate array using expressions:

arraydef = '[' list_expr ']'
list_expr = expr , [{ ',' , expr }] ;

We can get or set array’s element using:

arrayelem = IDENT , '[' , expr , ']' ;
setarrayelem = IDENT , '[' , expr , ']' , '=' , expr , ';' ;

Expression can be:

expr = expr , '+' , expr | expr , '-' , expr | expr , '*' , expr | expr , '/' , expr | expr , '%' , expr | proc_call | literal | variable | arrayelem | arraydef | emptyarray ;

We have if:

if_else = 'IF' , '(' , cond_expr , ')' , block_or_statement [ , 'ELSE' , block_or_statement ] ;
cond_expr = expr , '==' , expr | expr , '>=' , expr | expr , '<=' , expr | expr , '>' , expr | expr , '<' , expr | expr , '!=' , expr | '!' , expr ;
block_or_statement = block | statement ;

We have loop:

while = 'WHILE' , '(' , cond_expr , ')' , block_or_statement ;
break = 'BREAK' , ';' ;
continue = 'CONTINUE' , ';' ;

We can print value:

print = 'PRINT' , expr , ';' ;

We can define label — identifier ending with exclamation mark. We can also jump to it:

LABEL = letter_or_underscore , [ { letter_or_underscore_or_digit } , ] '!' ;
jump = 'JUMP' , LABEL , ';' ;

We can declare a function:

function_decl = 'function' , IDENT , '(' , [ list_param , ] ')' , block ;
list_param = [ list_param ',' , ] IDENT ;

We can define its body:

block = '{' , [ list_statement , ] '}' ;
list_statement = { statement } ;
statement = setvar | setarrayelem | if_else | while | proc_call | return | break | continue | print | jump | label ;

We can return value from a function:

return = 'RETURN' , [ expr , ] ';' ;

We can call functions:

proc_call = IDENT , '(' , [ list_expr , ] ')' , ';' ;

Finally, our program is a list of functions or statements:

start = program
program = list_function_statement ;
list_function_statement = [list_function_statement , ] function_decl | [list_function_statement , ] statement };

Summary

OK, we have our grammar. For now it is only for reference, since we will parse it in one of the last parts of this series. Next time we will write some code to represent values in memory and perform basic operations.

]]>
https://blog.adamfurmanek.pl/2016/07/23/yet-another-useless-language-part-2/feed/ 1
Yet Another Useless Language Part 1 — Introduction https://blog.adamfurmanek.pl/2016/07/16/yet-another-useless-language-part-1/ https://blog.adamfurmanek.pl/2016/07/16/yet-another-useless-language-part-1/#comments Sat, 16 Jul 2016 08:00:14 +0000 https://blog.adamfurmanek.pl/?p=1761 Continue reading Yet Another Useless Language Part 1 — Introduction]]>

This is the first part of the Yet Another Useless Language series. For your convenience you can find other parts using the links below :
Part 1 — Introduction
Part 2 — Grammar
Part 3 — Variables
Part 4 — Compiler
Part 5 — Variables
Part 6 — If
Part 7 — Loop
Part 8 — Function
Part 9 — Standard library
Part 10 — Parser
Part 11 — Storing binary on the disk

Today we start implementing simple language compiler using C# and PLY library. Before we dig into code let’s write down language features we would like to implement.

YAUL

We are going to implement scripting language using .NET Expressions for compiling and executing the code. Basically, we will define custom grammar, parse script’s source code using PLY, and create expression tree representing the script. Next, we will compile the expression tree and execute it. Thanks to that we will be able to easily compile the script code to executable binary written in managed code (so it will require .NET Framework to run).

We will implement the following features:

  • local variables declared ad-hoc
  • arrays
  • strings
  • while loop
  • function and recursive functions
  • basic standard library

Our syntax will resemble Python’s, e.g., hello world will look like this:

print "Hello, world";

We will handle integers, arrays, and strings:

number = 5;
text = "Hello!";
array = [1, "two", 3];
big_array = new [50];

We will handle common operations on numbers and strings. We will be able to extract characters from strings using array access operator.

We will handle if with else and while loop with break, continue. We will also handle goto to label.

We will be able to define recursive functions with our without return type. We will also have simple standard library with few functions.

All variables will be treated as local variables (no globals), also accessing the variable before its definition will be marked as an error. Flow instructions like if or while will not change the variables’ scoping — they will not define new scope. All functions will accept variables by value, there will be no passing by reference. Also, function’s parameters will be handled as an ordinary local variables.

Some basic programs written in YAUL:

Fibonacci:

first = 1;
second = 1;
while(first < 1000){
	print first;
	temp = second;
	second = first + second;
	first = temp;
}

Finding prime numbers:

function isPrime(number){
	temp = 2;
	while(temp < number){
		if(number % temp == 0){
			return 0;
		}
		temp = temp + 1;
	}

	return 1;
}

function checkPrimes(low, up){
	temp = low;
	while(temp <= up){
		if(isPrime(temp) == 1){
			print temp;
		}
		temp = temp + 1;
	}
}

checkPrimes(2,100);

Printing pyramid:

function printPyramid(length){
	if(length > 1){
		printPyramid(length - 1);
	}
	temp = 0;
	line = "";
	while(temp < length){
		line = line + "*";
		temp = temp + 1;
	}
	print line;
}

printPyramid(8);

In the next part we are going to write down the grammar for YAUL.

]]>
https://blog.adamfurmanek.pl/2016/07/16/yet-another-useless-language-part-1/feed/ 10
Using PLY in C# https://blog.adamfurmanek.pl/2016/06/18/using-ply-in-c/ https://blog.adamfurmanek.pl/2016/06/18/using-ply-in-c/#respond Sat, 18 Jun 2016 08:00:38 +0000 https://blog.adamfurmanek.pl/?p=1727 Continue reading Using PLY in C#]]> In this post I describe how to use Python Lex-Yacc (PLY) in C# with IronPython.

Introduction

IronPython is a .NET implementation of Python interpreter. It is based on Python 2.7.x and can be executed as separate application or run directly from .NET applications.

PLY is a Python implementation of Lex and Yacc. These tools are used to write compilers — they are capable of parsing file using LALR grammar and produce AST.

In this post I describe how to execute PLY from C# using IronPython. I have verified all steps in Visual Studio 2012-2015.

Let’s go

First, you need to create a project and install IronPython using Nuget. You can do it from package manager command line or by using Visual Studio package manager GUI. After installing IronPython you should have the following references in your project:

  • IronPython
  • IronPython.Modules
  • Microsoft.Dynamic
  • Microsoft.Scripting
  • Microsoft.Scripting.Metadata

Next, you need to download PLY files. Create a directory Lib\ply in your project directory and put there all PLY files (__init__.py, cpp.py, ctokens.py, lex.py, and yacc.py).

Since PLY uses some popular Python packages, you probably need to extract some files from ordinary Python distribution. In order to do that, just download Pyton 2.7.x, zip its Lib directory to python27.zip, and put the file in Lib directory in your project. You can remove redundant files from the archive and leave only ones required by PLY, however, it is not necessary for now.

Make sure, that all Lib files (python27.zip and PLY scripts) are copied to output directory — select them in Solution Explorer, hit F4, and set the flag appropriately. Also, change build action for python27.zip to Embedded Resource.

Now it is time to write some code. First, create a script Parsing\parser.py with method parse accepting text to parse:

def parse(text):
    if lex == 0 or yacc == 0:
        raise RuntimeError("Not initialized")

    global errors
    errors = []

    parsedObject = parser.parse(text, lexer=lexer)
    return parsedObject

This is just a stub of a parsing method which will invoke parsing using PLY. I leave all the rest for you, since the rest is just a PLY code (grammar definitions, code for building AST etc).

You also need another one function in your script, which will be used to load all PLY scripts:

def initialize(plyBasePath):
    global yacc
    global lex
    global sys
    global clr
    global parser
    global lexer
    global System

    import imp
    import sys
    import clr

    lex = imp.load_source('ply.lex', plyBasePath + '\\lex.py')
    yacc = imp.load_source('ply.yacc',  plyBasePath + '\\yacc.py')
    lexer = lex.lex(module = sys.modules[__name__], debug=1)
    parser = yacc.yacc(module = sys.modules[__name__])

    clr.AddReference("System")
    import System

This function defines variables for PLY, loads some .NET libraries, end prepares parser and lexer. Notice that it is running PLY in debug mode but it is only for tutorial purposes.

Make sure that parser.py is copied to output directory.

This is it when it comes to Python part (not including grammar and rest of PLY actual code). Now let’s move to C# part.

Let’s define a class for parser:

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Reflection;
using IronPython.Hosting;
using IronPython.Modules;
using Microsoft.Scripting.Hosting;

namespace Parsing
{
    public class Parser
    {
        private static dynamic _ipy;

        public Parser()
        {
            _ipy = _ipy ?? CreateEngine();
        }

        private dynamic CreateEngine()
        {
            ScriptRuntimeSetup setup = Python.CreateRuntimeSetup(GetRuntimeOptions());
            var pyRuntime = new ScriptRuntime(setup);
            ScriptEngine engineInstance = Python.GetEngine(pyRuntime);

            AddPythonLibrariesToSysMetaPath(engineInstance);

            dynamic ipy = pyRuntime.UseFile(@"Parsing\parser.py");
            ipy.initialize(GetPlyPath());

            return ipy;
        }

        private static Dictionary < string, object> GetRuntimeOptions()
        {
            var options = new Dictionary < string, object>();
            options["Debug"] = false;
            return options;
        }

        private static string GetPlyPath()
        {
            return Path.Combine(Environment.CurrentDirectory, "Lib", "ply");
        }
    }
}

We have simple Parser class which creates an IronPython runtime. You can notice that I am passing Debug flag to runtime so it will print out all debugging informations when something wrong happens.

As we can see, first I create a runtime with runtime options. Next, I add Python libraries to Python paths so our scripts will be able to use them. Next, I load a file parser.py and call its initialize method (the one which I described above).

Below is a function which adds Python libraries to path:

private void AddPythonLibrariesToSysMetaPath(ScriptEngine engineInstance)
{
	Assembly asm = GetType().Assembly;
	IEnumerable < string> resQuery =
		from name in asm.GetManifestResourceNames()
		where name.ToLowerInvariant().EndsWith("python27.zip")
		select name;
	string resName = resQuery.Single();
	var importer = new ResourceMetaPathImporter(asm, resName);
	dynamic sys = engineInstance.GetSysModule();
	sys.meta_path.append(importer);
	sys.path.append(importer);
}

First, we get current executing assembly. Next, we access Python libraries stored in a zip file which is in resources. Next, we use importer to extract them in memory and add to path.

Finally, here comes the crucial method for parsing code:

public object Parse(string content)
{
	return _ipy.parse(content);
}

We simply pass a file content to IronPython runtime and call parse method from our script.

]]>
https://blog.adamfurmanek.pl/2016/06/18/using-ply-in-c/feed/ 0