235 views

Uploaded by Paul Wintereise

This document explains how the CYK algorithm works, and provides two parsers -- one implemented in C++, and one in Haskell to deal with problems.

- Compiler
- Supplement_Paper692_Mahjong
- Programming Compilers--Principles, Techinques & Tools 2(Addison Wesley - Aho, Sethi, Ullman)(Lite
- Computer Programming
- Lecture2013.03.07
- adat
- Time Table - Final PERIYAR
- Assignmentno1 c++
- Tr Nsfer
- SL-awk
- Midterm 2011 Solution
- Computer Science Course Description
- Nai
- Lex
- 04 Syntax Analysis S16
- First and Follow Sets
- Untitled
- Dokumen.tips 11 String and String Builder Java
- Statement of Purpose.pdf
- 1 - Information Systems Skills Requirements in Malaysia

You are on page 1of 3

Aaron Gorenstein

February 15, 2014

This is a brief literate program written in Haskell implementing the CYK algorithm. I assume the reader

is already familiar with the CYK algorithm. My implementation is very simplisticthis is unavoidable as I

am a Haskell novice. This has not deterred me, because my target audience is other novices! In particular,

Im eager to share what I found to be the natural way of expressing the CYK algorithm in Haskell,

especially in comparison to my C++ implementation. In many ways it looks quite similar, but it seems to

more faithfully reflect the recursive structure our dynamic algorithm counts on.

There are two main sections in this document. The first details the foundation. There we define how

to express a CFG that is in CNF, and some reverse-lookup functions. That is, for a string of symbols in

the gammar G, is there a production in G of the form A ? If so, return all such A. The second section

focuses entirely on the main CYK algorithm, and focuses on explaining how our Haskell code reflects the

imperative definition as found in the C++ version. A final epilogue simply shows a very simple instatiation

of the algorithm on a grammar and input string.

Without further ado:

1

2

import Data.List -- for nub

The function nub takes a list and removes redundant elements. For instance, the list [1, 2, 3, 2, 3] is transformed to [1, 2, 3].

This code assumes that the grammar is in Chomsky Normal form, the definition of which I will not address

hereits very standard. Regardless, were able to define our grammar productions as having exactly two

forms (tuples, really), as we know that each production involves exactly 2 or 3 symbols.

1

2

type CFG = [Production]

The datatype Production is either a nonterminal production (NTprod) of the form A BC (observe that

the right-hand-side is made of nonterminals), or a terminal (Tprod) production, of the form A a, where a

is a terminal. We specify these options very concretely with Tuples of the associated string representations.

A context-free grammar is simply a collection of these productions, hence the nave definition in list form

in line 2 of the above code snippet. Observe that many things about the grammar are not checked (that the

start symbol is not used recursively, for instance).

With our grammar object defined, the only remaining feature we want to define is the reverse-lookup:

given a string of symbols from the gammar , return any nonterminal A such that the production A

exists. As we know our grammar is in CNF, there are exactly two cases: either is two nonterminals, or

is a single terminal. To facilitate implementation, first we want to be able to, given a Production of either

sort, extract its left-hand-side.

1

2

prodLHS (NTprod a _ _) = a

prodLHS (Tprod a _) = a

Now we can do the reverse lookup for the two cases. Theyre obviously very similar. Open question:

is there a nicer way of implementing this, perhaps as a single function? Regardless, here is the A BC

lookup (given BC, find all A such that A BC):

1

2

3

4

ntGens cfg (a, b) = map prodLHS (filter filterFunc cfg)

where filterFunc (NTprod _ x y) = x == a && y == b

filterFunc _ = False

1

2

3

4

termGens cfg b = map prodLHS (filter filterFunc cfg)

where filterFunc (Tprod _ y) = y == b

filterFunc _ = False

Those are the only data types and functions we need! We are now ready to define our main algorithm.

Lets jump right into the main algorithm, and we shall explain it line-by-line:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

cykMatrix cfg s =

let termGens = termGens cfg

ntGens = ntGens cfg

n = length s

m = array ((0,0), (n-1,n-1))

([((i,i), termGens [s!!i])

| i [0..n-1]] ++

[((r, r+l), generators r l) | l [1..n-1], r [0..n-l-1]])

where generators :: Int Int [String]

generators r l =

nub $ concat [ntGens (a,b) | t [0..l-1],

a m!(r,r+t),

b m!(r+t+1,r+l)]

in m

Lines 1-6 are just setting up the functionthe real magic starts afterwards. Recall from the CYK algorithm

design that m!(i, j) is the set of all nonterminals which generates the substring of s starting at index i and

ending at j. The base case, as shown on line 7, follows naturally: at m!(i, i), the nonterminals are exactly

those which generate the single terminal found at location i in s. (The curious [s!!i] notation is simply

because s!!i is a character, and for type-safety we have to promote back to a String type, which we do

with the [] notation.) On line 8, we say more-or-less the same thing, conceptually, but in the general case.

Thus, it is entirely the list-comprehension beginning on line 11 where the magic happens. It states:

The nonterminals which can generate the substring from index r to r + ` are exactly those C involved in the

production C AB, where AB are represented by (a, b) in the ntGens call. Moreover, for any a, b, it must

be that a generates some substring-prefix s[r, r + t] and b generates the corresponding suffix s[r + t + 1, r + `].

The values t can exist between 0 and ` 1. Quite a lot in packed into that single line!

I hope that was somewhat comprehensible. Note that compared with the C++ version, that single list

comprehension replaces 3 for loops! Most importantly, I think the list comprehension better expresses the

relationships between those three integers r, t, l. Hooray!

That was the major part of the work. At this point well simply present a hard-coded example. In the future

this code may be extended to actually take things from input files and so forth, but for now this is just a

quick test.

First, we use the matrix to actually compute our decision algorithm:

1

2

"S" elem cykMatrix cfg s !(0,n-1)

1

2

3

4

5

6

7

exampleCFG :: CFG

exampleCFG = [NTprod "S" "A" "B",

NTprod "A" "A" "A",

Tprod "A" "a",

NTprod "B" "B" "B",

-(NTprod "B" "A" "A"),

Tprod "B" "b"]

1

Thank you for reading, I hope this has helped you understand something about Haskell or the CYK algorithm. I also hope I have been able to share my excitement and enthusiasm with how different programming

languages can sometimes express things in intellectually pleasing contrasting ways.

- CompilerUploaded bySidharth Chaturvedy
- Supplement_Paper692_MahjongUploaded byMahjong
- Programming Compilers--Principles, Techinques & Tools 2(Addison Wesley - Aho, Sethi, Ullman)(LiteUploaded byJovino Silva
- Computer ProgrammingUploaded bymanoj14feb
- Lecture2013.03.07Uploaded bypikachu_latias_latios
- adatUploaded byBagas Oktavian
- Time Table - Final PERIYARUploaded bySaad Khan
- Assignmentno1 c++Uploaded byNiti Arora
- Tr NsferUploaded byanon_530965883
- SL-awkUploaded byAshish Kuvawala
- Midterm 2011 SolutionUploaded byisraa_melaca
- Computer Science Course DescriptionUploaded byNavarro College
- NaiUploaded byEvelyn Dniz
- LexUploaded bymrbkiter
- 04 Syntax Analysis S16Uploaded byBinit Agrawal
- First and Follow SetsUploaded bypalukapoop
- UntitledUploaded byvalliyykanu
- Dokumen.tips 11 String and String Builder JavaUploaded byMuhammad Ridwan
- Statement of Purpose.pdfUploaded byRakeshReddy
- 1 - Information Systems Skills Requirements in MalaysiaUploaded byBarath_Mot
- PrinciplesofCompilerDesign Part-B Imp QuestionsUploaded byjkkp
- D-87-2.pdfUploaded byregisanne
- Syntactic Complexity_ Information and ControlUploaded byMonica Duran Orozco
- Codigo 2 de MemoriaUploaded byArnol Castillo
- fUploaded bylennor muega
- Snake Game Code 2Uploaded byGanith Reddy
- 1Uploaded byYovhie Prasetio
- recursive parser(4.12).pdfUploaded byCiprian Sulu
- New Files to UploadUploaded byAnonymous HnAtAjn
- What Are the Events in ALVUploaded byGanesh Gani

- CVUploaded byDr-Sameer Saad
- C2248PE_SUMOIIUploaded byAhmed Mohamed Hassan
- Note to Mike 11 UsersUploaded byFredrick Mashingia
- Xii-standard Physics Study Material (for Slow Learners) by ManiUploaded byAbdul Basith
- Design ConsiderationsUploaded byEngr Amir Jamal Qureshi
- 05- FLEX v1Uploaded byTejotabe
- EEE_exp1.pdfUploaded byKhawjaJakaria
- Comparative Study of Finite ElementUploaded byHeineken Ya Praneetpongrung
- DFFUploaded bygumphekar
- Download 3d Programming for Windows Three Dimensional Graphics Programming for the Windows Presentation Foundation Developer Reference.pdfUploaded byacademia
- Masonry and Carpentry TosUploaded byMixengxeng Ü Espinosa
- IU Budget 11-12Uploaded bymnelson5566
- ResumeUploaded byFarrukh Ahmed
- 2012 International Prospectus_1Uploaded byuni123shaan
- 6-9465-K (DD530-S60C)Uploaded byluis
- Resume DSUploaded bynithinmamidala999
- 3DPVT2010Uploaded byenniotta
- Ball Pump Operation ManualUploaded byASHXQ39W
- 2. Mitsubishi Electric - Programul de Distributie 2004 (39919-N-04.04)Uploaded byandy175
- MEASUREMENT OF POWER CONSUMPTION BY USING VARIOUS COMPONENTS INAN EMBEDDED SYSTEMUploaded byIJIERT-International Journal of Innovations in Engineering Research and Technology
- Principles Of Turbomachinery.pdfUploaded byDaisy Mathew
- Buster Hernandez Aka Brian KilUploaded byKyle Bloyd
- ChatterjeeUploaded byben
- 201101 Landing gate English (1).pdfUploaded byBoruida Machinery
- 70294158-Electronic-Devices-and-Circuit-Theory-10th-Ed-Boylestad-Chapter-3.pdfUploaded byMir Mohsina Rahman
- C0 CAT 777 DUploaded byDeni T Kurniawan
- Stabilizing Sliding Mode Control Design and Application for a DC Motor: Speed ControlUploaded byijicsjournal
- Eaton.pdfUploaded byAnonymous vQewJPfVXa
- Halima George Resume+Updated+061113Uploaded bykemark85
- AIESEC in Austria National Partnership PortfolioUploaded byAIESECinAustria