Вы находитесь на странице: 1из 4

1.

Foundations of Language Theory

We now begin to lay the mathematical foundations of languages that we will use throughout the rest of this
book. Our viewpoint a language is a set of strings. In turn, a string is a nite sequence of letters from some
alphabet. These concepts are dened rigorously as follows.
Definition 1.5.1 An alphabet is any nite set. We will usually use the symbol to represent an alphabet
and write = {a1 , . . . , ak }. The ai are called the symbols of the alphabet.
Definition 1.5.2 A string (over ) is a function u : {1, ..., n} or the function : . The latter
is called the empty string or null string and is sometimes denoted by , , e or 1. If a string is non-empty
then we may write it by listing the elements of its range in order.
Example 1.5.3 = {a, b}, u : {1, 2, 3} by u(1) = a, u(2) = b and u(3) = a. We write this string as
aba.
The set of all strings over is denoted as . Thus {a} = {an |n = 0, 1, ...}, where we have introduced the
convention that a0 = . Observe that is a countable set.
It is a useful convention to use letters from the beginning of the alphabet to represent single letters and
letters from the end of the alphabet to represent strings.
Warning Although letters like a and b are used to represent specic elements of an alphabet, they may also
be used to represent variable elements of an alphabet, i.e. one may encounter a statement like Suppose that
= {0, 1} and let a .

A language (over ) is a subset of . Concatenation is a binary operation on the strings over a given
alphabet . If u : {1, ..., m} and v : {1, ..., n} then we dene u v : {1, ..., m + n} as
u(1)...u(m)v(1)...v(n) or

u(i)
for 1 i m
(u v)(i) =
v(i m) for m + 1 i m + n.
If u is the empty string then we dene u v = v and similarly if v is the empty string. Generally the dot
will not be written between letters.

Remarks Concatenation is not commutative, e.g. (ab)(bb) 6= (bb)(ab). But it is true that for any string u,
un um = um un . Concatenation is associative, i.e. u(vw) = (uv)w.
u is a prefix of v if there exists y such that v = uy. u is a suffix of v if there exists x such that v = xu.
u is a substring of v if there exists x and y such that v = xuy. We say that u is a proper prefix (suffix,
substring) of v i u is a prex (sux, substring) of v and u 6= v.
Each of the above relations are partial orders on . Given that is a totally ordered set, e.g. =
{a1 , ..., an }, then there is a natural extension to a total order on , called the lexicographic ordering. We
dene u v if u is a prex of v or there exists x, y, z and ai , aj such that in the order of we
have that ai < aj and u = xai y and v = xaj z.

Exercises
1 Given a string w, its reversal wR is dened inductively as follows: R = , (ua)R = auR , where a .
Also, recall that u0 = , and un+1 = un u. Prove that (wn )R = (wR )n .
2 Suppose that a and b are two dierent members of an alphabet. Prove that ab 6= ba.
3 Suppose that u and v are non-empty strings over an alphabet. Prove that if uv = vu then there is a
string w and natural numbers m, n such that u = wm , v = wn .
21

4 Prove that for any alphabet , is a countable set.


5 Lurking behind the notions of alphabet and language is the idea of a semi-group, i.e. a set equipped with
an associative law of composition that has an identity element. is the free semi-group over . Is a given
language over necessarily a semi-group ?

1.6

Operations on Languages

A way of building more complex languages from simpler ones is to combine them using various operations.
The union and intersection operations we have already seen.
Given some alphabet , for any two languages S, T over , the difference S T of S and T is the language
S T = {w | w S

and w
/ T }.

The dierence is also called the relative complement . A special case of the dierence is obtained when
S = , in which case we dene the complement L of a language L as
L = {w | w
/ L}.
The above operations do not make use the structure of strings. T he following operations make use of
concatenation.
Definition 1.6.1 Given an alphabet , for any two languages S, T over , the concatenation ST of S and
T is the language
ST = {w | u S, v T, w = uv}.

For any language L, we dene Ln as follows:

L0 = {},
Ln+1 = Ln L.

Example 1.6.2 For example, if S = {a, b, ab}, T = {ba, b, ab} and U = {a, a2 , a3 } then
S 2 = {aa, ab, aab, ba, bb, bab, aba, abb, abab},
T 2 = {baba, bab, baab, bba, bb, abba, abb, abab},
U 2 = {a2 , a3 , a4 , a5 , a6 },

ST = {aba, ab, aab, bba, bab, bb, abba, abb}.


Notice that even though S, T and U have the same number of elements, their squares all have dierent
numbers of elements. See the exercises for more on this funny phenomenon.
Multiplication of languages has lots of nice properties, such as L = , and L{} = L.
In general, ST 6= T S.
So far, all of the operations that we have introduced preserve the niteness of languages. This is not the
case for the next two operations.

22

Definition 1.6.3 Given an alphabet , for any language L over , the Kleene -closure L of L is the
innite union
L = L0 L1 L2 . . . Ln . . . .
The Kleene +-closure L+ of L is the innite union

L+ = L1 L2 . . . Ln . . . .
Since L1 = L, both L and L+ contain L. Also, notice that since L0 = {}, the language L always contains
, and we have
L = L+ {}.

However, if
/ L, then
/ L+ .

Remark has already been dened when is an alphabet. Modulo some set theory, the Kleene *-closure
of coincides with this previous denition if we view as a language over itself. Therefore the Kleene
*-closure is an extension of our original * operation.

Exercises
1 Prove the following identities:
(i) L = ,
(ii) L = ,
(iii) L{} = L,
(iv) {}L = L,
(v) (S {})T = ST T,
(vi) S(T {}) = ST S,
(vii) Ln L = LLn .
(viii) = {},

(ix) L+ = L L,
(x) L = L ,
(xi) L L = L .
2 Given a language L over , we dene the reverse of L as LR = {wR | w L}. For each of the following,
either prove equality or provide a counter example. Which of the false equalities can be made true by
replacing = with a containment sign ?
(i)(S T )R = S R T R ;
(ii)(ST )R = T R S R ;
(iii)(L )R = (LR ) .
(iv)(S T ) = S T .
(v)(ST ) = T S

3 Prove that if LL = L then L contains the empty string or L = .


4 Suppose that L1 and T are languages over a two letter alphabet. If ST = T S is S = T ?
5 Does A = B imply that A = B ? Find a counter example or provide a proof.
23

Definition 1.6.3 Given an alphabet , for any language L over , the Kleene -closure L of L is the
innite union
L = L0 L1 L2 . . . Ln . . . .
The Kleene +-closure L+ of L is the innite union

L+ = L1 L2 . . . Ln . . . .
Since L1 = L, both L and L+ contain L. Also, notice that since L0 = {}, the language L always contains
, and we have
L = L+ {}.

However, if
/ L, then
/ L+ .

Remark has already been dened when is an alphabet. Modulo some set theory, the Kleene *-closure
of coincides with this previous denition if we view as a language over itself. Therefore the Kleene
*-closure is an extension of our original * operation.

Exercises
1 Prove the following identities:
(i) L = ,
(ii) L = ,
(iii) L{} = L,
(iv) {}L = L,
(v) (S {})T = ST T,
(vi) S(T {}) = ST S,
(vii) Ln L = LLn .
(viii) = {},

(ix) L+ = L L,
(x) L = L ,
(xi) L L = L .
2 Given a language L over , we dene the reverse of L as LR = {wR | w L}. For each of the following,
either prove equality or provide a counter example. Which of the false equalities can be made true by
replacing = with a containment sign ?
(i)(S T )R = S R T R ;
(ii)(ST )R = T R S R ;
(iii)(L )R = (LR ) .
(iv)(S T ) = S T .
(v)(ST ) = T S

3 Prove that if LL = L then L contains the empty string or L = .


4 Suppose that L1 and T are languages over a two letter alphabet. If ST = T S is S = T ?
5 Does A = B imply that A = B ? Find a counter example or provide a proof.
23

Вам также может понравиться