Binary Tree Code Words As Context-Free Languages: The Computer Journal June 1998

See
discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/259474981
Binary Tree Code Words as Context-Free

Languages
Article in The Computer Journal · June 1998

DOI: 10.1093/comjnl/41.6.422 · Source: OAI
CITATION READS
1 85
1 author:
Erkki Mäkinen
University of Tampere
166 PUBLICATIONS 1,295 CITATIONS
SEE PROFILE
All content following this page was uploaded by Erkki Mäkinen on 31 March 2014.
The user has requested enhancement of the downloaded file. All in-text references underlined in blue are added to the original document
and are linked to publications on ResearchGate, letting you access and read them immediately.
Binary Tree Code Words as
Context-Free Languages
E RKKI M ÄKINEN
Department of Computer Science, University of Tampere, PO Box 607, FIN-33101 Tampere, Finland
Email: em@cs.uta.fi
Given a binary tree coding system, the set of valid code words of all binary trees can be considered
as a context-free language over the alphabet used in the coding system. The complexity of the
language obtained varies from one coding system to another. Xiang, Tang and Ushijima have
recently proved some properties of such languages. We show that their results can be more easily
proved by noticing the form of the generating grammar in question. Namely, in the simplest cases
the language obtained is a left Szilard language, a very simple deterministic language. Moreover,
we prove some new results concerning the ‘binary tree languages’.
Received February 2, 1998; revised September 17, 1998
1. INTRODUCTION 2. PRELIMINARIES
Different binary tree coding systems are introduced (for a When concerning context-free grammars and languages
survey, see [1]) and compared [2, 3, 4, 5] in the literature. we mainly follow the notations and definitions of [10].
Given a coding system, the set of valid code words forms a Similarly, we use the standard tree terminology [11].
language over the alphabet used in the code words. Let G = (V, 6, P, S) be a context-free grammar (here-
Xiang et al. [5] have recently studied the properties of after simply ‘grammar’) whose productions are uniquely
the languages generated by the following two context-free labelled by the symbols of an alphabet C. If a production
grammars (with symbols renamed here) A → α is associated with the label φ we write φ : A → α.
The production labelled with φ is called the φ-production. If
a sequence σ of labelled productions is applied in a leftmost
G 1 : S → a SS, S → bS, S → cS, S → d
derivation β ⇒∗ γ , we write β ⇒σ γ . Notice that
we consider leftmost derivations only and omit the normal
and
subscript indicating leftmost derivations. The left Szilard
G 2 : S → a SS, S → b. language Szl(G) of G is defined as [12]
Szl(G) = {σ ∈ C ∗ | S ⇒σ w, w ∈ 6 ∗ }.
G 1 produces valid code words in a system where a node
with two children is labelled with a, a node with only a left We consider reduced [10] grammars only, i.e. grammars
(respectively, right) child with b (respectively, c), and a leaf in which each nonterminal and terminal symbol appears in
with d. The code word is then obtained by reading the labels some terminal derivation from the start symbol to a terminal
in preorder. Variants of this coding system have also been string. A production is called terminating if there are no
studied by Korsh [6] and Bapiraju and Bapeswara Rao [7]. nonterminals in its right-hand side. Otherwise, a production
The coding system related to G 2 makes a difference only is continuing. If w is a word over an alphabet 6 and a is a
between internal nodes and leaves. The former are labelled symbol in 6 then a(w) stands for the number of a’s in w.
with a’s and the latter with b’s. Again, the labels are read The empty word is denoted by λ.
in preorder. The code words obtained are often called Zaks’ Given a grammar G, a grammar generating Szl(G) can
sequences, since they are introduced by Zaks in [8] (see also be obtained by replacing each production φ : A → α in P
[9]). by the production A → φη(α), where η is a homomorphism
In what follows we study G 1 and G 2 and other context- erasing terminal symbols. The grammar obtained has the
free grammars generating valid binary tree code words. property that each production has a unique terminal symbol
Xiang et al. [5] have proved several theorems concerning the in the beginning of its right-hand side and this is the only
languages L(G 1 ) and L(G 2 ) generated by G 1 and G 2 . We terminal in the right-hand side. We refer to such grammars
show that these results follow directly from the fact that G 1 as left Szilard grammars. A left Szilard grammar is
and G 2 are so-called left Szilard grammars obeying a certain always unambiguous. We have a one-to-one correspondence
strict form of context-free determinism. between productions in the original grammar G, the labels
We also consider other binary tree coding systems and the indicating the productions and the productions in the left
corresponding languages of their valid code words. Szilard grammar generating Szl(G).
T HE C OMPUTER J OURNAL, Vol. 41, No. 6, 1998

B INARY T REE C ODE W ORDS AS C ONTEXT-F REE L ANGUAGES 423
3. LEFT SZILARD BINARY TREE LANGUAGES (4) This is obvious from the grammatical point of view. If
the claim does not hold, we have no nonterminals left,
In this section we first reformulate and reprove some results
and we cannot continue the derivation. Hence, w0 could
from Xiang et al. [5]. Our proofs are based on the fact that
not be a proper prefix. (Again, the number of b’s and
G 1 and G 2 are both left Szilard grammars. Moreover, we
c’s is irrelevant.)
give some new results concerning L(G 1 ) and L(G 2 ).
(5) S → d is the only terminating production in G 1 . Each
P ROPOSITION 3.1. terminal derivation ends up with an application of a
terminating production.
(1) G 1 is unambiguous (Theorem 1 in [5]).
(2) A string w is a word in L(G 1 ) if and only if d(w) = The other results concerning L(G 1 ) given in [5] could be
a(w) + 1. A string w0 is a suffix of a word w in L(G 1 ) proved in a similar manner.
if and only if d(w0 ) ≥ a(w0 ) + 1 (Theorem 3 in [5]). Similar to (4) of Proposition 3.1 we can prove the
(3) A string w0 = wi wi+1 . . . wn is a suffix of a word following:
w = w1 . . . wi . . . wn in L(G 1 ) if and only if
P ROPOSITION 3.2. [5] If w0 is a suffix of a word in
1 ≤ d(w0 ) − a(w0 ) ≤ i L(G 2 ), we have
(Lemma 1 of [5]). b(w0 ) − a(w0 ) ≥ 1.
(4) For each proper prefix w0 of a word in L(G 1 ) we
have a(w0 ) ≥ d(w0 ) (the sufficient condition part of Usually this result is given in the following form.
Theorem 5 of [5]).
P ROPOSITION 3.3. [5] If w0 is a proper prefix of a word
(5) Each word in L(G 1 ) ends up with a symbol d
in L(G 2 ), we have
(Corollary 3 of Theorem 3 in [5]).
Proof. a(w0 ) − b(w0 ) ≥ 0.
(1) All left Szilard grammars are unambiguous. Proposition 3.3 is known as the dominating property of
(2) Each terminal derivation in G 1 starts from S and ends Zaks’ sequences [8].
up with a string with no appearances of S. Productions The rest of this section is devoted to some new results
S → bS and S → cS do not change the number concerning G 1 and G 2 and the languages generated by them.
of S’s, while S → a SS increases the number and
S → d decreases the number by one. Hence, in each P ROPOSITION 3.4. Each word w, w 6= d, in L(G 1 ) can
terminal derivation there must be one production S → be written in the form w = x yz, y 6= λ, such that all words
d more than S → a SS to rewrite all S’s. The one-to- x y i z, i = 0, 1, . . . , are in L(G 1 ).
one correspondence between productions and terminal Proof. If w has a subword u in {b, c}+, we can choose y = u.
symbols implies the claim concerning words in L(G). Otherwise, w must contain the subword ad, and we can set
The claim concerning suffixes follows from the fact y = ad. (The subword ad is produced by the subderivation
that each proper suffix corresponds to a derivation from S ⇒ a SS ⇒ ad S which can always be repeated an arbitrary
some sentential form to a terminal string. In order for number of times.)
the suffix to be proper the sentential form in question
must contain at least one appearance of a nonterminal. Actually, since S is the only nonterminal in G 1 , we can
Since S is the only nonterminal in G 1 , we can repeat the insert subwords (ad)i , i = 1, 2, . . . , and u, u ∈ {b, c}+ , in
reasoning above. In fact, if a sentential form contains k any word w in L(G 1 ) to obtain a new word in L(G 1 ). The
appearances of S then we have d(w0 ) = a(w0 ) + k for only restriction is that the subword cannot be inserted to the
the corresponding suffix w0 . rear of w.
Again, since productions S → bS and S → cS do not This observation has a natural interpretation: inserting ad
change the number of nonterminals, w0 is a suffix of corresponds to an expansion of an edge to contain a node
any word w = vw0 where with two children so that the left subtree has one node, and
inserting u corresponds to an expansion of an edge to contain
a(v) − d(v) + 1 = d(w0 ) − a(w0 ). nodes with one child.
A result similar to Proposition 3.4 also holds for G 2 .
(3) The lower bound follows from (2). The upper bound
follows from the fact that the corresponding prefix P ROPOSITION 3.5. Each word w, w 6= b, in L(G 2 ) can
w1 . . . wi−1 is produced by a derivation of length be written in the form w = x yz, y 6= λ, such that all words
i − 1 where each step can increase the number of x y i z, i = 0, 1, . . . , are in L(G 2 ).
nonterminals by 1. Hence, the derivation producing Proof. We can always choose y = ab.
w0 starts from a sentential form containing at most i
nonterminals. The upper bound is reached when S →
4. OTHER CODING SYSTEMS
d is applied i times to such a sentential form. (The
derivation related to the upper bound is S ⇒ . . . ⇒ The set of binary trees on n nodes is known to be in one-
a i−1 S i ⇒ . . . ⇒ a i−1 d i .) to-one correspondence with well-formed bracket sequences

424 E. M ÄKINEN
with n pairs of brackets. Hence, such a bracket sequence in the i th position ( j ). On the right-hand sides, the position
can be considered as a binary tree code word. The language is incremented (i + 1), and the possible value for the next
related to this coding system can be generated by the code item is on the interval [0 . . . j + 1].
grammar
G 3 : S → [S]S, S → λ. 5. CONCLUSION
Since L(G 3 ) is not prefix-free, it cannot be a left We have studied binary tree code words as context-free
Szilard language. (L(G 3 ) is deterministic, but not strict languages. We have shown that properties of these
deterministic in the sense of [10].) languages can be more easily proved by noticing the
To obtain Zaks’ sequences or the words in L(G 2 ), we derivational structure of the corresponding context-free
can perform the following algorithm: label each internal grammar.
node by a, label each leaf by b and read the labels in
preorder. Reading the labels in level-by-level order (from ACKNOWLEDGEMENTS
top to bottom and from left to right) we obtain a different
kind of code word. Level-by-level code words are used, for This work was supported by the Academy of Finland
example, in [13]. (Project 35025).
The set of all level-by-level code words coincides with
L(G 1 ). The only difference between the two methods REFERENCES
is the order in which nodes are handled. In the [1] Mäkinen, E. (1991) A survey on binary tree codings. Comp.
corresponding grammars this determines the rewriting order J., 34, 438–443.
of nonterminals in sentential forms. In the method related [2] Mäkinen, E. (1991) Efficient generation of rotational-
to G 1 this order is depth-first, while level-by-level code admissible codewords for binary trees. Comp. J., 34, 379.
words use ‘first-in-first-out’ order or breadth-first order. [3] Mäkinen, E. (1992) A note on graftings, rotations, and
In general, breadth-first order is not possible in normal distances in binary trees. EATCS Bull., 46, 146–148.
context-free grammars. However, our grammars contain [4] Lucas, J. M., Roelants van Baronaigien, D. and Ruskey,
only one nonterminal (S). Hence, it does not make any F. (1993) On rotations and the generation of binary trees.
difference whether we use depth-first or breadth-first order J. Algorithms, 15, 343–366.
of nonterminal rewriting. (The breadth-first restriction in [5] Xiang, L., Tang, C. and Kazuo Ushijima (1997) Grammar-
context-free derivations is studied, e.g., in [14, 15].) oriented enumeration of binary trees. Comp. J., 40, 278–291.
So far, we have restricted ourselves to coding systems [6] Korsh, J. F. (1993) Counting and randomly generating binary
using a fixed alphabet. Most of the coding systems trees. Inf. Process. Lett., 45, 291–294.
introduced in the literature use integers from the interval [7] Bapiraju, V. and Bapeswara Rao, V. V. (1994) Enumeration
[1 . . . n] when coding binary trees on n nodes. As an of binary trees. Inf. Process. Lett., 51, 125–127.
example of such coding systems we briefly discuss the left [8] Zaks, S. (1980) Lexicographic generation of ordered trees.
distance method [16]. In this coding method the code item Theoret. Comput. Sci., 10, 63–82.
related to a node equals the node’s distance from the left arm [9] Proskurowski, A. (1980) On the generation of binary trees.
J. ACM, 27, 1–2.
(i.e. the path from the root following the left child pointers).
[10] Harrison, M. A. (1978) Introduction to Formal Language
The code items are read in preorder in order to obtain the
Theory. Addison-Wesley, Reading, MA.
whole code word. The valid code words in the left distance
[11] Knuth, D. E. (1997) The Art of Computer Programming.
method have a simple characterization: (x 0 , x 1 , . . . , x n−1 ) is Vol. 1, Fundamental Algorithms (3rd edn). Addison-Wesley,
a valid code if and only if x 0 = 0 and 0 ≤ x i ≤ x i−1 + 1, Reading, MA.
for i = 1, . . . , n − 1. Based on this characterization, we [12] Mäkinen, E. (1985) On context-free derivations. Acta Univ.
can define a grammar generating the code words for binary Tamper., 197.
trees on at most n nodes. The language generated is finite [13] Lee, C. C., Lee, D. T. and Wong, C. K. (1983) Generating
and hence a regular grammar is sufficient. The productions binary trees of bounded height. Acta Inf., 23, 529–544.
are essentially of the form [14] Cherubini, A., Citrini, C., Crespi-Reghizzi, S. and Mandrioli,
D. (1990) Breadth and depth grammars and deque automata.
X i, j → a0 X i+1,0 , X i, j → a1 X i+1,1 , . . . Int. J. Found. Comp. Sci., 1, 219–232.
. . . , X i, j → a j +1 X i+1, j +1 , [15] Mäkinen, E. (1991) A hierarchy of context-free derivations.
Fundam. Inf., 14, 255–259.
where the subscripts (i, j ) of a nonterminal indicate the [16] Mäkinen, E. (1987) Left distance binary tree representations.
position in the code word (i ), and the value of the code item BIT, 27, 163–169.
View publication stats

Binary Tree Code Words As Context-Free Languages: The Computer Journal June 1998

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Binary Tree Code Words As Context-Free Languages: The Computer Journal June 1998

Загружено:

Авторское право:

Доступные форматы

See

Binary Tree Code Words as Context-Free

Article in The Computer Journal · June 1998

Received February 2, 1998; revised September 17, 1998

T HE C OMPUTER J OURNAL, Vol. 41, No. 6, 1998

T HE C OMPUTER J OURNAL, Vol. 41, No. 6, 1998

T HE C OMPUTER J OURNAL, Vol. 41, No. 6, 1998

View publication stats

Вам также может понравиться