More elaborate writing games

Peter Susanszky

More elaborate writing games

So far, our writing games basically put us into the role of a typewriter, where at each step, we could make a formula 1 symbol longer from our alphabet than it was before. The problem with this approach is that as it stands, it doesn’t really distinguish between those sequences of symbols that may be interesting for one reason or another, and those which are just gibberish.

In fact, we really didn’t need to go to the lengths we did, using base rules and productive rules, to specify any of the languages above. For note that in general, given their respective alphabets, in almost all of these languages, any finite sequence of symbols from the alphabet constituted a formula. With some modification, this definition can also be adapted to the languages

$\mathcal{L}_\mathbb{N}$

and

$\mathcal{L}_\mathbb{Z}$

.

Exercise 2.14. As just mentioned, for the languages preceding

$\mathcal{L}_\mathbb{N}$ and

$\mathcal{L}_\mathbb{Z}$ , we could specify immediately: Any finite sequence of symbols of the alphabet is a formula of the language. How would you change this definition to capture being a formula of

$\mathcal{L}_\mathbb{N}$ and

$\mathcal{L}_\mathbb{Z}$ ?

The language $\mathcal{L}_f$

To illustrate another way, most languages you are familiar with have various levels of expressions, and at each level, certain ‘well-formed’ expressions are distinguished from those that are not so. For example, as mentioned above,

$\mathcal{L}_{EA}$

is the English Alphabet language, which has as formulas all finite sequences of letters of the English alphabet. This entails that every English word is a formula of

$\mathcal{L}_{EA}$

, but most formulas of

$\mathcal{L}_{EA}$

are just random combinations of letters and not words of the English language. Similarly, when it comes to combinations of words in English, most of them are not well-formed. For example, while “I am running late from class” is a well-formed expression, “am class running I late from” is not. In other words, English sentences make up a small subset of all combinations of words of English.

To make these ideas more vivid, we can introduce a new language, which we will call

$\mathcal{L}_F$

. ‘

$F$

’ in the subscript stands for file system. You are probably familiar with file systems on your computer, that is, the files and folders on your computer that you can navigate with a click of a button. These file systems have several ‘levels’, based on the folders you have. Importantly, folders are stackable, so that you can have a folder inside a folder, and a folder inside a folder inside a folder, etc. The language

$\mathcal{L}_F$

will be able to specify which folders and files are in which folders, and once given an expression, you will be able to read off the file structure given by a certain formula of the language.

One thing that you may be less familiar with is the idea of a root folder. Note that on every computer, files and folders are in other folders, but only until they aren’t. Specifically, there is a folder on your computer that is not itself in any folder. That is your root (on Windows, think of C:\). As we specify the language

$\mathcal{L}_F$ in the following, we will be assuming that we are either in the root folder or a relative root folder (the ‘current working directory’). Essentially, we will be able to specify what is ‘under’ the folder we are working in, but not anything above it.

To start with our usual specification of the alphabet, we shall make use of certain base expressions, denoting (intuitively) the single files of our file system, and other expressions that are used merely to give structure to more complex expressions.

Let’s suppose that we have 5 distinct files, which we may denote

$f_1$

,

$f_2$

,

$f_3$

,

$f_4$

and

$f_5$

for simplicity. Next, we shall use the comma symbol ‘

$,$

’ to delineate individual files and folders that are ‘on the same level’, that is, in the same folder. For example, we may say something like

$f_1, f_3, f_5$

to specify there are three files. We also need a way to signal that some files and folders together form a folder. For this, we can put everything that belongs to a folder into curly brackets ‘

$\{$

’ and ‘

$\}$

’. So

$f_1, f_3, f_5$

denotes the files at the highest level, while

$\{f_1, f_3, f_5\}$

denotes these files in a single folder.

The above ‘intuitive’ specification can be put into our usual notation as follows.

Alphabet: The alphabet of

$\mathcal{L}_F$ consists of the symbols:

f_1

\mid

f_2

\mid

f_3

f_4

\mid

f_5

\mid

,

$\mid$ ${$ $\mid$ $}$

I used the symbol ‘

$\mid$

’ to give the above list since the comma symbol ‘

$,$

’ is itself a symbol of our language, so listing it with commas would look confusing. Specifying a language inside another is usually a painful experience.

Now for the base rule, we can say the following:

Base Rule of

$\mathcal{L}_F$ :

The symbols

$f_1$ ,

$f_2$ ,

$f_3$ ,

$f_4$ and

$f_5$ by themselves are all formulas. Moreover,

${}$ is also a formula.

Then, we need two separate productive rules, one for listing the contents of a folder, and another for denoting that some list of things is in a folder. Accordingly:

Productive Rule 1 of

$\mathcal{L}_F$ : if

$X$ and

$Y$ are formulas, then

$X, Y$ is a formula.

Productive Rule 2 of

$\mathcal{L}_F$ : if

$X$ is a formula, then

${X}$ is a formula.

Now we can start writing down formulas, and then try to understand what they actually say. One thing that immediate looks out of place is

$\{\}$

. Intuitively, what

$\{\}$

denotes is the empty folder. After all, there may be folders that do not have anything in them. In turn, there can be several empty folders in a folder, and in general, there may be a whole tower of folders that do not have any files at any level.

Exercise 2.15. Check whether the following sequences of symbols are formulas of the language

$\mathcal{L}_f$ , and if so, try to write down what file structure they represent:

$\{\}$
$\{\}, \{\}$
$\{\{\},\{\}\}$
$\{\{\{\}\}$
$\{\{\{\}, \{\}\}, \{\}\}$
$\{\{\}, \{\}\}, \{\}$

Of course, you can have file systems with actual files in them too, and you can specify these in our language.

Exercise 2.16. Check whether the following sequences of symbols are formulas of the language

$\mathcal{L}_f$ , and if so, try to write down what file structure they represent:

$f_1, f_2, \{\{f_2, f_4\}, \{f_3,f_3\}\}$
$\{f_1, \{\}\}$
$f_1$
$\{\{\{\{f_4f_2\}\}\}\}$
$\{\{f_2\}, \{\{f_3\}, f_5\}, \{f_1\}\}$
$\{f_1\}, \{\{f_2\}\}, \{\{\{f_3\}\}\}$

Exercise 2.17. Look at the first formula of Exercise 2.16 again. Can you see something peculiar in what it says? How would you make sense of it?

Let’s return to the distinction between well-formed formulas and random sequences of symbols of the alphabet. Given how the language

$\mathcal{L}_f$

is specified, not every sequence of symbols of the alphabet constitutes a formula. In Exercises 2.15 and 2.16, there were two sequences of symbols of the alphabet that are not formulas of the language¹ Here, it actually becomes useful to show by a derivation as before whether something is a formula of the language.

To save some space, let’s call Productive Rule 1 ‘PR1’, and Productive Rule 2 ‘PR2’. Similarly, we can use ‘BR’ instead of Base Rule. Here are two sample derivations of formulas from above.

(1)	${}$	(BR)
(2)	${}, {}$	(PR1: 1, 1)
(3)	${{}, {}}$	(PR2: 2)
(4)	${{}, {}}, {}$	(PR1: 1,3)

(1)	$f_{1}$	(BR)
(2)	${}$	(BR: 1)
(3)	$f_{1}, {}$	(PR1: 1, 2)
(4)	${f_{1}, {}}$	(PR2: 3)

Exercise 2.18. Derive all the other formulas of the language

$\mathcal{L}_f$

from Exercises 2.15 and 2.16. Note: if something is not a formula, you cannot derive it (though you can certainly try).

Finally, let’s think a bit about

$\{$

and

$\}$

. These curly braces are really what give our language its structure and its expressive power. In fact, though the comma is useful for humans like us to parse formulas, it is not really necessary. We can change the PR1 to omit it like this:

Productive Rule 1∗ of

$\mathcal{L}_F$ : if

$X$ and

$Y$ are formulas, then

$X Y$ is a formula.

Then, we get the following formulas from above:

$\begin{aligned} \{\{\}, \{\}\}, \{\} &\mapsto \{\{\}\{\}\}\{\}\\ \{f_1, \{\}\} &\mapsto \{f_1\{\}\} \end{aligned}$

Exercise 2.19. Rewrite all the formulas of

$\mathcal{L}_f$ from Exercises 2.15 and 2.16 assuming we changed PR1 into PR1

$^*$ (in other words, we did away with the previous comma convention).

On the other hand, we really cannot do away with our curly braces, since they are the ones that tell us what we should consider as being in a folder. It is very important where we put those braces, since the resulting formula may represent something entirely different from what we intended. For example, if you are interested in how many folders there are in your root folder, it matters whether the formula is

$\{\}, \{\}, \{\}, \{\}$

or

$\{\{\}, \{\}, \{\}, \{\}\}$

. The first one says there are 4 folders, the second says there is only one.

Exercise 2.20. How many files or folders are there in root if the formula specifies the file structure

$f_1, f_2, \{\{f_2, f_4\}, \{f_3,f_3\}\}$ ?

The language $\mathcal{L}_{AE}$

The next language we will consider is the language

$\mathcal{L}_{AE}$

, the language of artihmetic expressions. This language is something you are familiar with from your high school studies. Essentially, expressions in

$\mathcal{L}_{AE}$

are the arithmetic expressions that you had to compute with, and which can flank the identity symbol

$=$

. For example,

$(3+4)-(5 \times 6)$

,

$(300\times 33)$

,

$(5555-3333)$

,

$(23 \times 3)$

. Note that which numbers we are working with is not immaterial, since these numbers will appear in these formulas. Accordingly, we will be using the positive and negative integers plus

$0$

(i.e.,

$\mathcal{L}_\mathbb{Z}$

).

Let’s start by explicitly defining

$\mathcal{L}_\mathbb{Z}$

. Making use of the above simplifications regarding expressions without structural delineators, we may say:

Alphabet of

$\mathcal{L}_\mathbb{Z}$ : The alphabet of the language consists of

$0$ ,

$1$ ,

$2$ ,

$3$ ,

$4$ ,

$5$ ,

$6$ ,

$7$ ,

$8$ ,

$9$ and

$-$ .

Formulas of

$\mathcal{L}_\mathbb{Z}$ : Any finite sequence of symbols

$0$ ,

$1$ ,

$2$ ,

$3$ ,

$4$ ,

$5$ ,

$6$ ,

$7$ ,

$8$ ,

$9$ is a positive integer provided the first member of the sequence is not

$0$ . If

$X$ is a positive integer,

$- X$ is a negative integer. Then,

$X$ is an integer (formula) provided it is a positive integer, a negative integer, or it is

$0$ .

Remark 2.2. Note that this definition is slightly different from the previous ones. It first directly defines the positive integers. Then, it defines the negative integers from the positive ones. Then, it considers

$0$ , and adds these three groups together.

Based on

$\mathcal{L}_\mathbb{Z}$

, one can define the language of arithmetic expressions,

$\mathcal{L}_{AE}$

quite simply. Clearly, its alphabet will consist of the alphabet of

$\mathcal{L}_\mathbb{Z}$

, plus additional symbols to formulate arithmetic expressions. These symbols are the usual ones, that is:

$+$

,

$\times$

, and

$–$

.² We also need the crucial delineators, which in arithmetic expressions are just the parentheses

$($

and

$)$

.

Thus, we have:

Alphabet of

$\mathcal{L}_{AE}$ : the symbols

$0, 1, 2, 3, 4, 5, 6, 7, 8, 9$ ,

$+$ ,

$\times$ ,

$–$ ,

$($ , and

$)$ .

Base Rule of

$\mathcal{L}_{AE}$ : every integer (formula of

$\mathcal{L}_\mathbb{Z}$ ) is an arithmetic expression.

Productive Rule of

$\mathcal{L}_{AE}$ : if

$X$ and

$Y$ are arithmetic expressions, then

$(X+Y)$ ,

$(X-Y)$ , and

$(X \times Y)$ are arithmetic expressions.

Now as before, for any arithmetic expression we can come up with, we can prove that they are formulas of

$\mathcal{L}_{AE}$

. For example, take

$((414 \times -14134)\times (835+345))$

.

(1)	$414$	(BR)
(2)	$- 14134$	(BR)
(3)	$(414 \times - 14134)$	(PR: 1, 2)
(4)	$835$	(BR)
(5)	$345$	(BR)
(6)	$(835 + 345)$	(PR: 4, 5)
(7)	$((414 \times - 14134) \times (835 + 345))$	(PR: 3, 6)

Or another one of a different form:

$(34 \times ((99- -36)\times 9))$

.

(1)	$99$	(BR)
(2)	$- 36$	(BR)
(3)	$(99 - - 36)$	(BR)
(4)	$9$	(BR)
(5)	$((99 - - 36) \times 9)$	(PR: 3, 4)
(6)	$34$	(PR: 4, 5)
(7)	$(34 \times ((99 - - 36) \times 9))$	(PR: 6, 5)

Let us continue our discussion of the structure of our newfound expressions. In arithmetic expressions, delineators are used to give an order to the computation represented by the formulas. Sometimes, the order in which these operations are carried out does not matter, but many times, it makes a crucial difference. For example,

$(4+5)+3$

computes to the same number as

$4+(5+3)$

(namely,

$12$

), but

$(4-5)-3$

does not compute to the same number as

$4-(5-3)$

. In fact, the first one is a negative number,

$-4$

, while the second is a positive number,

$2$

. So it is clearly very important to place parentheses in the right places.

Exercise 2.21. Give an example of an arithmetic expressions with three (not necessarily distinct) numbers connected by two (not necessarily distinct) arithmetic operations where how the parentheses are placed does not matter. Then, give another example where how the parentheses are placed does matter. Don’t forget to write down your reasoning in each case.

What is not very helpful about the linear derivations that we have been using so far is that in their line-by-line representation, they do not really show us visually what the structure of our expressions is. However, this can be remedied by using so-called syntactic trees to represent how formulas are formed.

Note that in each derivation, we first take the base expressions, and then form more complex expressions, and then more complex expressions from previous expressions, until we get to the desired formula. Moreover, more complex expressions are formed by putting together two simpler expressions with an arithmetic operator, flanked by the left and right parentheses.

Let’s return to

$((414 \times -14134)\times (835+345))$

as our example. Then, we could represent this in tree-form as follows:

As you can see, this type of representation shows you the starting points, which we get by the Base Rule. These are at the bottom. Then, as we build up the more complex formulas, it shows how we put together those simpler expressions to get the more complex ones.

Our other example,

$(34 \times ((99- -36)\times 9))$

, will result in a different looking tree. Namely:

You can also read these trees from top to bottom. In fact, for some reason, mathematicians call the highest formula the root of the tree, from which it branches. The lowest points (to which no other points are connected) are the tree’s leaves. Finally, any path from the root to one of the leaves is a branch. So really, the tree is upside-down!

At any rate, if reading the tree from bottom-to-top tells you how a formula is formed, reading it from top-to-bottom tells you how a formula can be broken down into its constituent parts.

You might be wondering: which formulas should I take as constituent when a tree branches? For example, why is it the case that in the first example, we branched to

$(414 \times -14134)$

and

$(835+345)$

, while in the second example, we branched to

$34$

and

$((99- -36)\times 9)$

? The answer: the parentheses tell us. At each step, there is a main arithmetic operator, which connects together the two formulas we branch to. Other arithmetic operators occur inside nested parentheses, and are to be broken down at a later step.

Exercise 2.22. Decide which is the main operator in each of these formulas:

$((93+73)\times(43-15))$
$(975 \times (44+1))$
$((77\times 3)- 33)$
$(((3 \times (5+4)) \times 9)-7)$
$((((3 \times 5)+4) \times 9)-7)$
$(((3 \times 5)+4) \times (9-7))$

When you are drawing a syntactic tree, it is useful to start with the formula you are aiming to construct. Then, at each step, you have to find the main operator, and put the two formulas it connects on separate branches below it. If you repeat this process enough times, each tree will have integers on its leaves. If some leaf is not an integer, you have to continue.

Exercise 2.23. Construct syntax trees for each of the six formulas in Exercise 2.22. Observe the difference between the last three formulas, which only differ in their structure (the way the parentheses are distributed).

As discussed above, parentheses determine the order of computation for each arithmetic expression. And in fact, when you look at syntax trees for these expressions, you can read off the order of computation from their structure. In order to compute any arithmetic expression, you need to start from the leaves, which are integers. Then, the next thing you have to compute is the expression immediately above. Once computed, you can move to the next level, substituting the result of the computation for the occurrence of the expression in the more complex formula above, then computing again. Repeating these steps until you get to the top will give you the final result.

This description is probably a bit confusing on first read, so here is a simple example with

$(3+5) \times (6-2)$

. The starting point is the whole tree:

Expression tree of ((3 + 5) × (6 - 2)) breaking down to 3 plus 5 and 6 minus 2. (3+5) is further broken down to numbers 3 and 5. (6-2) is further broken down to numbers 6 and 2.

Then, take the left side first and compute

$(3+5)$

, resulting in:

The image shows a mathematical expression tree for ((3 + 5) times (6 - 2)), breaking it down into 8 and (6-2). (6-2) is then broken down into numbers 6 and 2.

We can then substitute the results of our computation in the more complex formula above. This will result in:

Expression tree of ((3 + 5) × (6 - 2)) broken down to numbers 8 and 4.

Accordingly, the answer is

$32$

.

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

The language ℒf\mathcal{L}_f

The language ℒAE\mathcal{L}_{AE}

License

Share This Book

The language $\mathcal{L}_f$

The language $\mathcal{L}_{AE}$