Alphabets, formulas, languages

Peter Susanszky

Alphabets, formulas, languages

So far, our formulas are not very interesting. They are just black circles one after another. So let’s make some generalizations that allow us to produce more interesting formulas. The first thing we can do is add some more symbols. Since these symbols are just the basic building blocks of more complicated expressions, usually, they are called the alphabet.

The English alphabet consists of 26 symbols (normal people call these ‘letters’). Ours so far consists of one symbols, ●. There is absolutely no limit as to how many symbols you can have in your alphabet. For now, we shall add 3 more symbols, and declare our alphabet.

Alphabet: our alphabet consists of the following symbols:

●, ◻, ★, △

Then, we can change our rules so that the base rule, Rule 1, lets us put down any member of the alphabet by itself, and similarly, so that our productive rule, Rule 2R, lets us put down any member of the alphabet to the right of the formula we already have.

We can also be more specific about what we are doing when we are giving these rules. For in fact, what we are really doing is defining what a formula is! So we can write:

Base Rule: Any member of the alphabet by itself is a formula.

Productive Rule: If

$X$ is a formula and

$Y$ is a member of the alphabet, then

$X Y$ is a formula.

This may be confusing at first. What is

$X$

? Well,

$X$

is any formula! What is

$Y$

? Well,

$Y$

is any member of our alphabet! But how do we know what counts as a formula (here,

$X$

) if we are just now defining it? You already know the answer to this. We build formulas in steps, and at each step, we get a new formula. So anything we can construct with our rules is a formula.

Let’s look at an example carefully. Suppose that you want to determine whether ◻●◻△ is a formula in our new system. In order to make sure that it is, we need to show that it is, using our rules. As a first step, we know that ◻ by itself is a formula, since it is a member of the alphabet (this is the Base Rule). In turn, using the Productive Rule, we can infer that there are at least four other formulas. Namely:

◻●, ◻△, ◻★, ◻◻.

This is so because we know ◻ is a formula already, and the Productive Rule tells us that if

$X$

(here

$X=$

◻) is a formula and

$Y$

is a member of the alphabet, then

$XY$

, the result of putting ◻ first, then putting down whatever

$Y$

we want from the alphabet, is also a formula. So in particular, ◻● is a formula (when

$Y=$

●).

Now we can use this reasoning again, but this time, the

$X$

in our Productive Rule is ◻●, and

$Y$

needs to be ◻. So then by the rule,

$XY$

, that is, ◻●◻ is a formula. Finally, using the Productive Rule again with

$X=$

◻●◻ and

$Y=$

△, we can see that ◻●◻△ is a formula too.

Of course, this is very wordy. But we already know how to make this reasoning more compact. Like this:

(1)	◻	(Base Rule)
(2)	◻●	(Productive Rule: 1)
(3)	◻●◻	(Productive Rule: 2)
(4)	◻●◻△	(Productive Rule: 3)

We call this a formal derivation, or simply, derivation of the formula ◻●◻△ in the language. Indeed, this is a formal proof that the sequence of symbols ◻●◻△ is a formula of the language.

Notice that we are now referring not only to the rule, but to the line number on which the rule is used. This is so since the Productive Rule needs two ‘inputs’: something we already know to be a formula, the other a member of the alphabet. And what we know to be a formula is what appears on one of the previous lines.

Finally, it is a good idea to have some general term to refer to combinations of alphabets and rules of construction (either of the base type or the productive type) to not get lost in all these different combinations. In the literature, these are called different languages. We will come to see why this is the case in due time (though you may already see the connections). At any rate, languages are usually referred to by the fancy letter ‘l’ like this:

$\mathcal{L}$

. And when there is more than one, we can use subscripts or superscripts (or both) to distinguish them.

Here is a nice table of the languages we considered so far:

Language	Alphabet	Rules
$\mathcal{L}_1$	●	Rule 1, Rule 2L, Rule 2R
$\mathcal{L}_2$	●	Rule 1, Rule 2R
$\mathcal{L}_3$	●, ◻, △, ★	Base Rule, Productive Rule

Note that languages as we defined them are identified by their alphabet and the rules for constructing their formulas. This means that two distinct languages
may have the exact same formulas, even if they are constructed through different rules.

Exercise 2.5. Consider the languages

$\mathcal{L}_1$ ,

$\mathcal{L}_2$ , and

$\mathcal{L}_3$ . Which of these three languages share all their formulas?

Exercise 2.6. Consider the languages

$\mathcal{L}_2$ and

$\mathcal{L}_3$ . Is it true that all the formulas of

$\mathcal{L}_1$ are formulas of

$\mathcal{L}_2$ ? What about the converse: is it true that all the formulas of

$\mathcal{L}_2$ are formulas of

$\mathcal{L}_1$ ?

Sometimes, it is useful to abstract away from the specific symbols of a language, and just concentrate on the ‘roles’ they play in the writing game. We already discussed this above as the idea relates to chess. Namely, it doesn’t matter what forms the chess pieces take as long as they retain the role assigned to them by the rules of the game.

We can formulate this idea related to our languages as follows. If we have a language with a certain alphabet consisting of

$n$

distinct symbols, we can replace that alphabet with an alternative one, provided it also has

$n$

distinct symbols, and we specify which symbol is exchanged for which new symbol. Then, we can call our new language an alphabetic variant of the old one.

What the idea of an alphabetic variant of a language captures is that we are really playing the same writing game, but with different looking pieces. Here is an alphabetic variant of

$\mathcal{L}_3$

: change ● to the letter A, change ◻ to the letter E, change △ to the letter S, and change ★ to the letter T. The two rules, Base Rule and Productive Rule are the same as before. You can call this language

$\mathcal{L}_4$

:

Language	Alphabet	Rules
$\mathcal{L}_4$	A, E, S, T	Base Rule, Productive Rule

Exercise 2.7 (The English word game). Here is a new game. Write down as many formulas of the language

$\mathcal{L}_4$ as you can that coincide with English words. An example is: EATS. Make sure that you are capable of specifying how each word can be derived using the alphabet and the two rules of

$\mathcal{L}_4$

Exercise 2.8. Make your own alphabetic variant of one of the languages above. You can use whatever symbols you’d like. Then, give 5 example formulas from your new language.

Exercise 2.9. Consider the language

$\mathcal{L}_{EA}$ , the English Alphabet language. Unsurprisingly, the alphabet of

$\mathcal{L}_{EA}$ is the whole (uppercase) English alphabet. The two rules are still the Base Rule and the Productive Rule. Answer the following questions:

Are there more formulas of
$\mathcal{L}_{EA}$ than English words, or are there more English words than formulas of

$\mathcal{L}_{EA}$ ? In the first case, you should be able to give a formula that is not a word, in the second case, you should be able to give a word that is not a formula.
Are the following expressions formulas of
ℒEA\mathcal{L}_{EA}?
1. COMPUTER
2. A.I.
3. BLACKBOARD
4. BLACK BOARD
5. RUN!
Is
$\mathcal{L}_{EA}$ an alphabetic variant of any of the languages

$\mathcal{L}_1$ ,

$\mathcal{L}_2$ ,

$\mathcal{L}_3$ ,

$\mathcal{L}_4$ ? Don’t forget to give an explanation!

To finish this chapter, let’s define a language which properly captures a class of expressions we use in everyday life. By ‘properly’, I mean that every such expression will be a formula of the language, and every formula of the language will be such an expressions. These expressions will be the positive natural numbers, so let’s call our language

$\mathcal{L}_{\mathbb{N}^+}$

(where

${\mathbb{N}^+}$

is the symbol for the positive natural numbers).

Every positive natural number is a finite sequence of the numbers

$0$

,

$1$

,

$2$

,

$3$

,

$4$

,

$5$

,

$6$

,

$7$

,

$8$

,

$9$

with the exception that no number starts with

$0$

. Accordingly, our alphabet will be the following:

Alphabet of

$\mathcal{L}_{\mathbb{N}^+}$ : The alphabet consists of the symbols

$0$ ,

$1$ ,

$2$ ,

$3$ ,

$4$ ,

$5$ ,

$6$ ,

$7$ ,

$8$ ,

$9$ .

Then, we have the following slightly modified Base Rule:

Base Rule of

$\mathcal{L}_{\mathbb{N}^+}$ : Every symbol of the alphabet of

${\mathbb{N}^+}$ is a formula except

$0$ .

On the other hand, we don’t need to change the Productive Rule, since the Base Rule takes care of any exceptions. Thus:

Productive Rule of

$\mathcal{L}_{\mathbb{N}^+}$ : If

$X$ is a formula and

$Y$ is a member of the alphabet, then

$X Y$ is a formula.

Exercise 2.10. Explain why the language

$\mathcal{L}_{\mathbb{N}^+}$

is capable of producing all positive natural numbers as formulas, and why it is incapable of producing a formula that is not a positive natural number. In your explanation, mention why there can be no formula produced that starts with a sequence of zeroes (a sequence of the symbol

$0$

).
Exercise 2.11. Think of a way to change the Base Rule and the Productive Rule for

$\mathcal{L}_{\mathbb{N}^+}$

so that all the non-negative integers (the positive natural numbers plus

$0$

) are produced as formulas, but no ‘unnatural numbers’ like

$000323$

are produced.

–STOP–

Here is one way to answer Exercise 2.11 and define the language

$\mathcal{L}_\mathbb{N}$

. Change the base rule so that it does allow for having each symbol as a formula by itself.

Base Rule of

$\mathcal{L}_\mathbb{N}$ : Every symbol of the alphabet of

${\mathbb{N}^+}$ is a formula.

Then, we make sure that the Productive Rule does not allow construction from

$0$

, only from the other symbols as follows:

Productive Rule of

$\mathcal{L}_\mathbb{N}$ : If

$X$ is a formula other than

$0$ and

$Y$ is a member of the alphabet, then

$X Y$ is a formula.

Exercise 2.12. It is obvious that the new rule does not allow the construction of formulas like

$05$

. Does it allow the construction of longer ‘unnatural’ numbers like

$00304$

? Check that this is indeed impossible with the above rules.

Exercise 2.13. A trickier question. After you have a specification for the language

$\mathcal{L}_\mathbb{N}$

, can you give one for

$\mathcal{L}_\mathbb{Z}$

, which consists of all integers? Note: integers are the positive and negative natural numbers, plus zero.

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

License

Share This Book