This is the second part of the YAUL series. For your convenience you can find other parts in the table of contents in Part 1 — Introduction

Last time we described features of a language we are going to write. Today we are going to define its grammar using EBNF-like notation.

Table of Contents

Notation

We will describe the notation using the following syntax:

Elements can be written in any case
Optional elements are written in square brackets: [Optional]
One or more elements are written in curly brackets: {one_or_more}
Literals are written in quotation marks: 'literal'
Literals are case insensitive
Question marks indicates parts described in natural language

Identifiers

We start with defining identifiers. We will handle only latin letters, digits, and underscores. Variable’s name will need to start with letter or underscore:

IDENT = letter_or_underscore , [ { letter_or_underscore_or_digit } ] ;
letter_or_underscore = letter | underscore ;
letter_or_underscore_or_digit = letter_or_underscore | digit;
underscore = '_' ;
digit = '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' ;
letter = 'a' | 'b' | 'c' | 'd' | 'e' | 'f' | 'g' | 'h' | 'I' | 'j' | 'k' | 'l' | 'm' | 'n' | 'o' | 'p' | 'q' | 'r' | 's' | 't' | 'u' | 'v' | 'w' | 'x' | 'y' | 'z' | 'A' | 'B' | 'C' | 'D' | 'E' | 'F' | 'G' | 'H' | 'I' | 'J' | 'K' | 'L' | ' M' | 'N' | 'O' | 'P' |' Q' | 'R '| 'S' | 'T' | 'U' | 'V' | 'W' | 'X' | 'Y' | 'Z' ;

IDENT = letter_or_underscore , [ { letter_or_underscore_or_digit } ] ;

letter_or_underscore = letter | underscore ;

letter_or_underscore_or_digit = letter_or_underscore | digit;

underscore = '_' ;

digit = '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' ;

letter = 'a' | 'b' | 'c' | 'd' | 'e' | 'f' | 'g' | 'h' | 'I' | 'j' | 'k' | 'l' | 'm' | 'n' | 'o' | 'p' | 'q' | 'r' | 's' | 't' | 'u' | 'v' | 'w' | 'x' | 'y' | 'z' | 'A' | 'B' | 'C' | 'D' | 'E' | 'F' | 'G' | 'H' | 'I' | 'J' | 'K' | 'L' | ' M' | 'N' | 'O' | 'P' |' Q' | 'R '| 'S' | 'T' | 'U' | 'V' | 'W' | 'X' | 'Y' | 'Z' ;

We can see that IDENT is a letter_or_underscore, optionally followed by letters, digits, or underscores. We also specified all possible letters and digits we can handle.

Trivial values

Numbers are just digits, we don’t handle fractions:

NUMBER = {digit} ;

1	NUMBER = {digit} ;

Strings are just any printable characters delimited by double quotation marks:

STRING = '"' ; whatever , '"' ;
whatever = ? any_printable_character ? ;

1 2	STRING = '"' ; whatever , '"' ; whatever = ? any_printable_character ? ;

Variables are just identifiers:

variable = IDENT ;

1	variable = IDENT ;

Literals are strings or numbers:

literal = STRING | NUMBER ;

1	literal = STRING \| NUMBER ;

We can modify variable’s value:

setvar = IDENT , '=' , expr , ';' ;

1	setvar = IDENT , '=' , expr , ';' ;

We can allocate new array:

emptyarray = 'NEW' , '[' , NUMBER , ']' ;

1	emptyarray = 'NEW' , '[' , NUMBER , ']' ;

We can also allocate array using expressions:

arraydef = '[' list_expr ']'
list_expr = expr , [{ ',' , expr }] ;

1 2	arraydef = '[' list_expr ']' list_expr = expr , [{ ',' , expr }] ;

We can get or set array’s element using:

arrayelem = IDENT , '[' , expr , ']' ;
setarrayelem = IDENT , '[' , expr , ']' , '=' , expr , ';' ;

1 2	arrayelem = IDENT , '[' , expr , ']' ; setarrayelem = IDENT , '[' , expr , ']' , '=' , expr , ';' ;

Expression can be:

expr = expr , '+' , expr | expr , '-' , expr | expr , '*' , expr | expr , '/' , expr | expr , '%' , expr | proc_call | literal | variable | arrayelem | arraydef | emptyarray ;

1	expr = expr , '+' , expr \| expr , '-' , expr \| expr , '*' , expr \| expr , '/' , expr \| expr , '%' , expr \| proc_call \| literal \| variable \| arrayelem \| arraydef \| emptyarray ;

We have if:

if_else = 'IF' , '(' , cond_expr , ')' , block_or_statement [ , 'ELSE' , block_or_statement ] ;
cond_expr = expr , '==' , expr | expr , '>=' , expr | expr , '<=' , expr | expr , '>' , expr | expr , '<' , expr | expr , '!=' , expr | '!' , expr ;
block_or_statement = block | statement ;

if_else = 'IF' , '(' , cond_expr , ')' , block_or_statement [ , 'ELSE' , block_or_statement ] ;

block_or_statement = block | statement ;

We have loop:

while = 'WHILE' , '(' , cond_expr , ')' , block_or_statement ;
break = 'BREAK' , ';' ;
continue = 'CONTINUE' , ';' ;

while = 'WHILE' , '(' , cond_expr , ')' , block_or_statement ;

break = 'BREAK' , ';' ;

continue = 'CONTINUE' , ';' ;

We can print value:

print = 'PRINT' , expr , ';' ;

1	print = 'PRINT' , expr , ';' ;

We can define label — identifier ending with exclamation mark. We can also jump to it:

LABEL = letter_or_underscore , [ { letter_or_underscore_or_digit } , ] '!' ;
jump = 'JUMP' , LABEL , ';' ;

1 2	LABEL = letter_or_underscore , [ { letter_or_underscore_or_digit } , ] '!' ; jump = 'JUMP' , LABEL , ';' ;

We can declare a function:

function_decl = 'function' , IDENT , '(' , [ list_param , ] ')' , block ;
list_param = [ list_param ',' , ] IDENT ;

1 2	function_decl = 'function' , IDENT , '(' , [ list_param , ] ')' , block ; list_param = [ list_param ',' , ] IDENT ;

We can define its body:

block = '{' , [ list_statement , ] '}' ;
list_statement = { statement } ;
statement = setvar | setarrayelem | if_else | while | proc_call | return | break | continue | print | jump | label ;

block = '{' , [ list_statement , ] '}' ;

list_statement = { statement } ;

We can return value from a function:

return = 'RETURN' , [ expr , ] ';' ;

1	return = 'RETURN' , [ expr , ] ';' ;

We can call functions:

proc_call = IDENT , '(' , [ list_expr , ] ')' , ';' ;

1	proc_call = IDENT , '(' , [ list_expr , ] ')' , ';' ;

Finally, our program is a list of functions or statements:

start = program
program = list_function_statement ;
list_function_statement = [list_function_statement , ] function_decl | [list_function_statement , ] statement };

start = program

program = list_function_statement ;

list_function_statement = [list_function_statement , ] function_decl | [list_function_statement , ] statement };

Summary

OK, we have our grammar. For now it is only for reference, since we will parse it in one of the last parts of this series. Next time we will write some code to represent values in memory and perform basic operations.