Вы находитесь на странице: 1из 9

GRAPHS AS FORMAL LANGUAGES

MOSES A. BOUDOURIDES

A Very Brief Outline of the Theory of Formal Languages Let us rst start with a brief discussion of the very fundamental denitions of the theory of formal languages (standard references are the books of Lewis & Papadimitriou, 1998, and Hopcroft et al. , 2001). An alphabet is a (nite or countably innite) set of symbols. A string over this alphabet is any nite sequence of symbols. Typically, a string w over will be written as w = 1 2 k , for some 1 , 2 , . . . , k . The string without any symbols is called empty string and it is denoted by . We denote by {w} the set of symbols contained in the string w. The length |w| of a string w is dened as the cardinality of {w}, i.e., it is the number of symbols contained in w (apparently, strings of length one are just symbols and || = 0). If a string w is of the form w = uzv , for three strings u, z and v , then u is called prex, v sux and z substring (of w). If a string w is of the form w = z , for two symbols and (possibly = ) and a substring z , then the symbols and are called terminal (we say that is the left-terminal and is the right-terminal of w) and all the symbols contained in the substring z of w are called internal. The reversal of a string w, denoted by wR , is dened as wR = , if |w| = 0, and wR = uR , if |w| > 0 and w = u, for some symbol and some string u. Moreover, a string w is called palindromic, if w = wR . For any string w and any integer j , the j -th power of w is dened as wj = , if j = 0, and j 1 wj = ww j > 0, and the (Kleene) star closure of the string w is dened , if j as w = j =0 w . Thus, the set of all possible strings over the alphabet will be denoted by (always assuming that ); is called universal language (over ). Any set of strings over , i.e., any subset L , is called a (formal) language. Notice that, above and elsewhere from now on, the + sign and the summation operator denote unions (), as this is a typical notational convention in the theory of formal languages.

A Typology of Strings Motivated by Graph Theory Now, being motivated by the theory of graphs (see, for example, the denitions in pp. 4-5 in Bolob as, 1998), if we suppose that we have a language L , we may identify the alphabet with the set of vertices V of a graph G = (V, R) (where R is an appropriate subset of the Cartesian product V V ). Then, we obtain the following denitions: (i) A string is called nontrivial if it contains at least two distinct symbols. Usually, a nontrivially string is called a walk.
1

MOSES A. BOUDOURIDES

(ii) A walk (or string) is called a trail if its terminal symbols are distinct. A closed trail, i.e., a trail whose terminal symbols coincide, is called a circuit. (iii) A trail is called a path if all its symbols are distinct, i.e., a path is of the form w = z , for two terminal symbols and and a string z with |z | 0 such that = , , / {z } and, for any , {z }, = . When |z | = k 2, for some k 2, then the path w = z is called a k -path. Usually, a 2-path w = is called a dyad. The following is an example of a path in a graph: 1 2 Figure 1. A 5-path. (iv) A path is called cyclic if it contains at least three symbols and its terminal symbols are the same, i.e., a cyclic string is of the form w = z, for a symbol and a path z such that |z | 2, / {z } and, for any , {z }, = . When |z | = k 1, for some k 3, then the cyclic path w = z is called a k -cycle. Of course, any cyclic string w = z can also be represented as w = x , for any {z } such that / {x} (but {x}), where the ordered symbols of {x } derive from a cyclic permutation of the symbols of {z}. Usually, a 3-cycle w = is called a triangle. The following is an example of a cycle in a graph: 2 1 5 3 4 3

Figure 2. A 6-cycle. In this way, we may dene the following elementary languages as sets of the above graph-theoretic typology of strings (as walks): (i) Walks intersecting at a single symbol: Let us consider a nite number of walks, w1 , w2 , . . . , wk , all of which contain the symbol and do not contain any other common symbol. Without any losss of generality, let us assume that is the left-terminal of all these walks, i.e., w1 = u1 , w2 = u2 , . . . , wk = uk , for certain strings u1 , u2 , . . . , uk . Then, the union w = w1 + w2 + . . . + wk of these walks forms a language Lkstar (), which is called a k -star centered at (with k radii u1 , u2 , . . . , uk ): Lkstar () =
k ( i=1

) ui .

For example, in a graph, a k -star looks as follows:

GRAPHS AS FORMAL LANGUAGES

u2 u1

u3

u4

Figure 3. A 4-star at . (ii) Walks branching o at distinct symbols: A more general case than k -stars is the case of a series of walks the terminal symbols of which are (appropriate) k -stars. Then, the union of these walks forms a language Ldendrite (), which is called a h-dendrite, i.e., it is a tree with root reaching some height h. In the particular case that a dendrite is composed of dyads, it is written as: Ldendrite () =

(1) (k i1 =1

ui1 i1

(2) (k i2 =1

ui1 i2 i1 i2

(3) (k i3 =1

(h) )))))) ( ( (k ui1 i2 ih i1 i2 ih . ui1 i2 i3 i1 i2 i3 ui1 i2 ih1 i1 i2 ih1 ih =1

where all the s are distinct symbols and all the us are distinct paths, which do not contain any common symbols (neither the s). The following is an example of a h-dentrite in a graph: 1 11 111 112 211 21 212 2 22 221

Figure 4. A 3-dendrite at . (iii) Cycles intersecting at a single symbol: Let us consider a nite number of cycles, all of which contain the symbol . Then their union forms a language Lkcyclic bundle , which is called k -cyclic bundle around : Lkcyclic bundle = ( k
i=1

) ui ,

where all the us are distinct paths, which do not contain any common symbols (neither as an internal symbol). This structure in a graph looks like:

MOSES A. BOUDOURIDES

4 2 3 1 2

4 3 1 2

Figure 5. A 4-cyclic bundle. In the spacial case that the above cycles are triangles, we obtain a k -twopath Lktwo-path (, ) from to b: ( Lktwo-path (, ) = which in a graph looks like: 4
k i=1

) i ,

1 Figure 6. A 4-two path. Moreover, in the latter case, if the dyad is a common substring of all the cycles, we obtain a k -triangle Lktriangle (, ) with common edge , which can be written as: ( Lktriangle (, ) = This structure in a graph looks like:
k i=1

) i .

GRAPHS AS FORMAL LANGUAGES

1 Figure 7. A 4-triangle. (iv) Stars and cycles intersecting at a single symbol: The union of a p-star centered at symbol and q r-cycles passing from forms a language L(p,q,r)lollipop (), which is called a (p, q, r)-lollipop: ( L(p,q,r)lollipop () =
p i=1

ui +

q i=1

)
r vi

r where the ui s and the vi s are distinct paths (without any common symr bols, neither containing internally) such that each one of the vi s has length r 1. In a graph, this structure looks like:

3 2 1

4 3

Figure 8. A (3,1,6)-lollipop. Languages Corresponding to Graphs Now, in a graph G = (V, R), any element of R is dened to join two vertices of V and, thus, we write r = xy (for r R and x, y V ) and we say that the vertices x and y are adjacent (or neighboring) in G. Of course, in a graph, not all vertices are adjacent to each other (unless the graph is complete). Therefore, if we take vertices as symbols, a graph G can be identied with a language L(G) (over the alphabet V ), which is formed by interpreting adjacency of vertices as concatenation of symbols. We call L(G) graph language (corresponding to graph

MOSES A. BOUDOURIDES

G) and formally we dene L(G) as: L(G) = {x1 x2 xk V : for any k 2 and xj xj +1 R, for all j = 1, . . . , k 1}. In other words, strings in L(G) are not arbitrary concatenations of symbols/verticies: they are solely concatenations corresponding to walks in G. Hence, for a general graph G, the language L(G) constitutes only a particular subset of the universal language V . Furthermore, as long as G is specied to be a graph of a certain type, then L(G) is necessarily constrained into an appropriate (proper) subset of V . For instance, if there exist no loops in the graph (i.e., no edges joining a vertex to itself), then all symbols/vertices become idempotent in the sense that x = x, and if the graph is undirected, then all strings should be palindromic, since adjacency is a symmetric relation. So, in the sequel, let us restrict ourselves to the case that the graph G is simple and undirected, in which case we write G = (V, E ), where V is the set of the graph vertices and E the set of the graph edges. As we have already mentioned, this implies the following three things: (i) First, being simple means that, for any vertex v , vv / E (and, in general, for any integer j > 2, v j / E ). However, for the shake of formal simplicity, if vertices are seen as symbols, we may identify vv (and v j ) with v itself, in the sense that a concatenation of a symbol/vertex with itself does not produce anything else but it keeps on being itself, i.e.: v j = v, for any v V and any integer j 1. (ii) Second, being undirected means that, the adjacency relation between any two vertices is symmetric. Now, taking vertices as symbols implies that the concatenation of any two adjacent (distinct) symbols is reversible in the sense that: uv = vu, for any u, v V such that uv E. (iii) Third, being simple also means that between any two adjacent vertices u and v there exists only one symmetric edge uv E . But the symmetry of edges in dyads of adjacent vertices ammounts to the following reduction of particular triads of symbols/vertices (due to (i) and (ii)): uvu = uv = vu = vuv, for any u, v V such that uv E. Therefore, the graph language L(G), corresponding to a simple undirected graph G, should be sought as a subset of a particular set V V . Now, V will be called universal graph language (over the set of vertices V , which are seen as symbols) and it is dened as: V = {w V : w satises properties P1 and P2}, where P1: w is such that a symbol v {w} may reappear in w only after at least two other (distinct) symbols, i.e., for any vzv and z substrings of w, we have necessarily that v / {z } and |z | 2. P2: w is palindromic (i.e., wR = w), which, in particular, implies that V V .

GRAPHS AS FORMAL LANGUAGES

In other words, given a set of vertices/symbols V , the universal graph language V is composed of all (possible) walks that could be formed among these vertices (in the corresponding complete graph), i.e., L(G) = V if and only if G is complete. Now, let us assume that the simple and undirected graph G is, moreover, connected. Then from the breadth-rst search algorithm (cf., Easley & Kleinberg, 2010, pp. 413), we obtain the following result. Proposition 1. The graph language L(G) V (over the alphabet V ) corresponding to the connected simple undirected graph G = (V, E ) can be represented, for any vertex v , as: L(G) =
m j =1

Lj (v ),

where each Lj (v ) is a concatenation of lollipops passing from v and any two concatenations of lollipops Lj (v ) and Lk (v ) may intersect only at vertices which are terminal symbols of strings in at least one of them. Example 1. Consider the graph of Figure 9.

v10 v5 v8 v6 v1 v9 v7 v2 v4 v3

Figure 9. The graph of Example 1. Then we have, for instance with respect to v1 , L(G) = L1 (v1 ) + L2 (v1 ) + L3 (v1 ), where L1 (v1 ) = L2 (v1 ) = L3 (v1 ) = {v1 v2 v3 v4 v6 v1 }, i.e., a 5-cycle, v7 (v1 + v5 + v4 v6 v7 ), i.e., a (2,1,3)-lollipop, {v9 v1 v8 v5 v10 }, i.e., a 5-path.

MOSES A. BOUDOURIDES

Example 2. Consider the graph of Figure 10.

v1

v2

v3

v4

v5

v6

v7

v8

v9

v10

v11

Figure 10. The graph of Example 2. Then we have, for instance with respect to v6 , L(G) = L1 (v6 ) + L2 (v6 ) + L3 (v6 ), where L1 (v6 ) = L2 (v6 ) = L3 (v6 ) = v6 v5 (v2 v1 v3 v2 + v3 + v9 ), i.e., a 4-star concatenated with a triangle, v10 (v6 + v9 + v11 ), i.e., a 3-star, {v6 v7 v11 v8 v4 v7 }, i.e., a (1,1,4)-lollipop. Grammars Corresponding to Graphs Since any string in a graph language L(G) (over V ) is derived from the primitive regular expressions , and v V by a nite number of applications of the operations of concatenation and union, L(G) is a regular language (which, moreover, is constrained by symmetry and idempotency, whenever G is a simple undirected graph). As a consequence, L(G) can be generated by a regular grammar = (U, T, P ), where U is the set of variables, T is the set of terminal symbols and P is the set of rules producing the strings of L(G). In this case, we have: U T = E, = V, i.e., the variables are the graph edges, i.e., the terminal symbols are the graph vertices

and every vertex v V , which is incident with two edges E1 , E2 E , generates a rule in P of the form E1 vE2 , which is a symmetric rule: i.e., E2 vE1 holds too. Furthermore, now it is not really necessary to have to assign a special symbol to a start variable. We may start from any edge/variable, although we still need to append the following rule of termination to the nal edge/variable Ef : Ef .

GRAPHS AS FORMAL LANGUAGES

Of course, in this way, we may produce all nontrivial strings of the graph language. For any isolated vertex v of the graph, we need to add a rule of the form F v , where the variable F is a new variable, which is not an edge. Example 3. Consider the graph G of Figure 11, i.e., a 2-triangle. 2 E3 1 E2 E1 Figure 11. A 2-triangle. Here, the set of variables is U = {E1 , E2 , E3 , E4 , E5 }, the set of terminals is T = {, , 1 , 2 } and the grammatical rules are the following: E1 E2 E3 E4 E5 E2 | E3 | E4 | E5 , E1 | 1 E4 | , E1 | 2 E5 | , E1 | 1 E2 , E1 | 2 E3 . E4 E5

Since L(G) = (1 + 2 ), the rst triangle is produced as: E1 E2 1 E4 1 E1 1 E2 1 and the second triangle is produced as: E1 E3 2 E5 2 E1 2 E2 2 . References Bolob as, B ela. 1998. Modern Graph Theory. New York: Springer. Easley, David, & Kleinberg, Jon. 2010. Networks, Crowds, and Markets: Reasoning About a Highly Connected World. New York: Cambridge University Press. Hopcroft, John E., Motwani, Rajeev, & Ullman, Jefrey D. 2001. Introduction to Automata Theory, Languages, and Computation. 2nd edn. Boston: AddisonWesley. Lewis, Harry L., & Papadimitriou, Christos H. 1998. Elements of the Theory of Computation. 2nd edn. Upper Saddle River, NJ: Prentice-Hall.
Department of Mathematics, University of Patras, 265 00 Patras, Greece E-mail address : mboudour@upatras.gr

Вам также может понравиться