Вы находитесь на странице: 1из 13

Aim: To study Lexical analyzer.

Theory:

LEX
Lex is a computer program that generates lexical analyzers ("scanners" or "lexers"). Lex is
commonly used with the YACC parser generator. Lex, originally written by Mike Lesk and
became the standard lexical analyzer generator on many Unix systems, and a tool exhibiting
its behaviour is specified as part of the POSIX standard.
Lex is a program generator designed for lexical processing of character input streams. It
accepts a high-level, problem oriented specification for character string matching, and
produces a program in a general purpose language which recognizes regular expressions. The
regular expressions are specified by the user in the source specifications given to Lex. The
Lex written code recognizes these expressions in an input stream and partitions the input
stream into strings matching the expressions. The Lex source file associates the regular
expressions and the program fragments. As each expression appears in the input to the
program written by Lex, the corresponding fragment is executed.
Lex is not a complete language, but rather a generator representing a new language feature
which can be added to different programming languages, called ``host languages.'' Just as
general purpose languages can produce code to run on different computer hardware, Lex can
write code in different host languages. The host language is used for the output code
generated by Lex and also for the program fragments added by the user. Compatible run-time
libraries for the different host languages are also provided. This makes Lex adaptable to
different environments and different users. Each application may be directed to the
combination of hardware and host language appropriate to the task, the user's background,
and the properties of local implementations. At present, the only supported host language is
C. Lex itself exists on UNIX, GCOS, and OS/370; but the code generated by Lex may be
taken anywhere the appropriate compilers exist.
Lex

reads

an

input stream specifying

the

lexical

code implementing the lexer in the C programming language.

analyzer

and

outputs source

Structure of a Lex file


The structure of a Lex file is intentionally similar to that of a YACC file; files are divided into
three sections, separated by lines that contain only two percent signs, as follows:
Definition section
%%
Rules section
%%
C code section

The definition section defines macros and imports header files written in C. It is also
possible to write any C code here, which will be copied verbatim into the generated
source file.

The rules section associates regular expression patterns with C statements. When the
lexer sees text in the input matching a given pattern, it will execute the associated C
code.

The C code section contains C statements and functions that are copied verbatim to
the generated source file. These statements presumably contain code called by the
rules in the rules section. In large programs it is more convenient to place this code in
a separate file linked in at compile time.

Lex and parser generators, such as YACC or Bison, are commonly used together. Parser
generators use a formal grammar to parse an input stream, something which Lex cannot do
using simple regular expressions (Lex is limited to simple finite state automata).
It is typically preferable to have a (YACC-generated, say) parser be fed a token-stream as
input, rather than having it consume the input character-stream directly. Lex is often used to
produce such a token-stream.
Conclusion: Hence we studied LEX.

Aim: To study YACC and JCC


Theory:

YACC
YACC stands for yet another compiler compiler. It is a tool that translate any grammar that
describes a language into a parser for that language. Yacc provides a general tool for
imposing structure on the input to a computer program.
The YACC user prepares a specification of the input process; this includes rules describing
the input structure, code to be invoked when these rules are recognized, and a low-level
routine to do the basic input. YACC then generates a function to control the input process.
This function, called a parser, calls the user-supplied low-level input routine (the lexical
analyzer) to pick up the basic items (called tokens) from the input stream. These tokens are
organized according to the input structure rules, called grammar rules; when one of these
rules has been recognized, then user code supplied for this rule, an action, is invoked; actions
have the ability to return values and make use of the values of other actions.
YACC is a parser generator. It is to parsers what Lex is to scanners. YACC is probably the
most common of the LALR tools.
YACC is written in Backus Naur Form(BNF) and a YACC file has the suffix .y.
There are four steps involved in creating compiler in YACC:
1. Generate a parser from YACC by running YACC over the grammar file
2. Specify the grammar
a. Write the grammar in a .y file
b. Write lexical analyzer to process input and pass tokens to the parser.
c. Write a function that starts parsing by calling yyparse().
d. Write error handling routines.
3. Compile the code produced by YACC as well as any other relevant source files.
4. Link the object files to appropriate libraries for the executable parser.

Fig: Structure of YACC Compiler


Symbols in YACC

Terminal symbol: Terminal symbol represents a class of syntactically equivalent


tokens. Terminal symbols are of three types:
o Named Token: These are defined via the % token identifier. By convention,

these all are upper case.


o Character Token: A Character token written in the same format as in C.
o Literal string token: It is written like string constant. Eg <<.
Non-terminal Symbol: It is a symbol that is a group of non-terminal and terminal
symbols. By convention these all are in lower case.

File Format of YACC


The file format of YACC is as follow:
%{
C declarations
%}
YACC declarations
%%
Grammar rules
%%
Additional C code

Definitions section: All code between %{ and %} is copied to the C file. The
definitions section is where we configure various parser features such as defining
token codes, establishing operator precedence and association, and setting up the
global variable used to communicate between the scanner and the parser.

Rules section: the required productions section is where we specify the grammar

rules.
Additional C Code: It is used for ordinary C code that we want copied verbatim to the
generated C file.

Applications of YACC
YACC has been extensively used in numerous practical applications, including lint, the
Portable C Compiler, and a system for typesetting mathematics.

JavaCC (Java Compiler Compiler)


JavaCC (Java

Compiler

Compiler)

is

an open

source parser

generator and lexical

analyzer generator for the Java programming language. JavaCC is similar to yacc in that it
generates a parser from a formal grammar written in EBNF (Extended Backus Nour
Form) notation, except the output is Java source code. Unlike yacc, however, JavaCC
generates top-down parsers, which limits it to the LL(k) class of grammars (in particular, left
recursion cannot be used). JavaCC also generates lexical analyzers in a fashion similar to lex.
The tree builder that accompanies it, JJTree, constructs its trees from the bottom up. JavaCC
is licensed under a BSD license.
Java compiler is most popular parser generator for use with Java applications. A parser
generator is a tool that reads a grammar specification and converts it to a java program that
can recognize matches to the grammar. In addition to the parser generator itself, JavaCC
provides other standard capabilities related to parser generation such as tree building.
JavaCC works with any Java VM version 1.2 or higher. It has been certified to be 100% pure
Java. JavaCC has been tested on countless different platforms without any special porting
requirements. A Java compiler is a compiler for the Java programming language. The most
common form of output from a Java class files containing platform neutral Java bytecode.
There are also certain compilers which generates optimized native machine code for a
particular hardware/operating system combination.

Tokens in the grammar file follows the same conventions as for the Java programming
language. Hence identifiers, strings, characters etc. used in the grammars are the same as Java
identifiers, Java strings, Java characters etc.
White space in the grammar files also follows the same conventions as for the Java
programming language. This includes the syntax for comments. Most comments present in
the grammar files are generated into the generated parser/lexical analyzer. Grammar files are
pre-processed for Unicode escapes just as Java files.
List of software built using JavaCC:

Apache Derby

BeanShell

FreeMarker

PMD

Conclusion: Hence we studied YACC and JCC

Aim: To study different debugger tools.


Theory:

Debugger
A debugger or debugging tool is a computer program that is used to test and debug other
programs (the "target" program). The code to be examined might alternatively be running on
an instruction set simulator (ISS), a technique that allows great power in its ability to halt
when specific conditions are encountered but which will typically be somewhat slower than
executing the code directly on the appropriate (or the same) processor. Some debuggers offer
two modes of operationfull or partial simulationto limit this impact.
When the program "crashes" or reaches a preset condition, the debugger typically shows the
location in the original code if it is a source-level debugger or symbolic debugger, commonly
now seen in integrated development environments.
If it is a low-level debugger or a machine-language debugger it shows the line in the
disassembly (unless it also has online access to the original source code and can display the
appropriate section of code from the assembly or compilation).
Features of Debuggers:
Typically, debuggers also offer more sophisticated functions such as running a program step
by step (single-stepping or program animation), stopping (breaking) (pausing the program to
examine the current state) at some event or specified instruction by means of a breakpoint,
and tracking the values of variables. Some debuggers have the ability to modify program
state while it is running. It may also be possible to continue execution at a different location
in the program to bypass a crash or logical error.
The same functionality which makes a debugger useful for eliminating bugs allows it to be
used as a software cracking tool. It often also makes it useful as a general verification
tool, test coverage and performance analyzer, especially if instruction path lengths are shown.
Some debuggers operate on a single specific language while others can handle multiple
languages transparently. For example if the main target program is written in COBOL but

calls assembly language subroutines and PL/1 subroutines, the debugger may have to
dynamically switch modes to accommodate the changes in language as they occur.
Some debuggers also incorporate memory protection to avoid storage violations such
as buffer overflow. This may be extremely important in transaction processing environments
where memory is dynamically allocated from memory 'pools' on a task by task basis.
Most modern microprocessors have at least one of these features in their CPU design to make
debugging easier:

Hardware support for single-stepping a program, such as the trap flag.

An

instruction

set

that

meets

the Popek

and

Goldberg

virtualization

requirements makes it easier to write debugger software that runs on the same CPU as the
software being debugged; such a CPU can execute the inner loops of the program under
test at full speed, and still remain under debugger control.

In-System Programming allows an external hardware debugger to reprogram a system


under test (for example, adding or removing instruction breakpoints). Many systems with
such ISP support also have other hardware debug support.

Hardware support for code and data breakpoints, such as address comparators and
data value comparators or, with considerably more work involved, page fault hardware.

JTAG access

to

hardware

debug

interfaces

such

as

those

on ARM

architecture processors or using the Nexus command set. Processors used in embedded
systems typically have extensive JTAG debug support.

Some of the most capable and popular debuggers implement only a simple command line
interface (CLI)often to maximize portability and minimize resource consumption.
Developers typically consider debugging via a graphical user interface (GUI) easier and more
productive. This is the reason for visual front-ends, that allow users to monitor and control
subservient CLI-only debuggers via graphical user interface. Some GUI debugger front-ends
are designed to be compatible with a variety of CLI-only debuggers, while others are targeted
at one specific debugger.
List of some well-known debugger:

1.

Turbo Debugger

2.

GNU Debugger (GDB)

3.

Intel Debugger (IDB)

4.

LLDB

5.

Microsoft Visual Studio Debugger

6.

Valgrind

7.

WinDbg

Java Compiler and Debugger


A Java Compiler is a computer program or set of programs which translates java source
code into java byte code.
The output from a Java compiler comes in the form of Java class files (with .class
extension). The java source code contained in files end with the .java extension. The file
name must be the same as the class name, as classname.java. When the javac compiles the
source file defined in a .java files, it generates bytecode for the java source file and saves in a
class file with a .class extension.
The most commonly used Java compiler is javac, included in JDK from Sun Microsystems.
Following figure shows the working of the Java compiler:

Fig: Working of Java Compiler


Once the byte code is generated it can be run on any platform using Java Interpreter (JVM). It
interprets byte code (.class file) and converts into machine specific binary code. Then JVM
runs the binary code on the host machine.
The JDK (Java Delvelopmnet Kit) also comes with Java Debugger known as jdb. jdb helps
you find and fix bugs in Java language programs. The Java Debugger, jdb, is a simple
command-line debugger for Java classes. It is a demonstration of the Java Platform Debugger
Architecture that provides inspection and debugging of a local or remote Java Virtual
Machine.
Syntax of jdb command is as follows:
jdb [ options ] [ class ] [ arguments ]

options - Command-line options, as specified below.


class - Name of the class to begin debugging.
arguments -Arguments passed to the main() method of class.

There are many ways to start a jdb session. The most frequently used way is to
have jdb launch a new Java Virtual Machine (VM) with the main class of the application to
be debugged. This is done by substituting the command jdb for java in the command line. For
example, if your application's main class is MyClass, you use the following command to
debug it under JDB:
C:\> jdb MyClass

When started this way, jdb invokes a second Java VM with any specified parameters, loads
the specified class, and stops the VM before executing that class's first instruction.

Turbo Debugger
Turbo Debugger (TD) is a machine-level debugger for MS-DOS executables, intended
mainly for debugging Borland Turbo Pascal (TP), and later Turbo C (TC) programs, sold
by Borland. This tool was a full-screen debugger displaying both TP or TC source and

corresponding assembly-language instructions, with powerful capabilities for setting


breakpoints, watching the execution of instructions, monitoring machine registers, etc. Turbo
Debugger could be used for programs not generated by Borland compilers, but without
showing source statements; it was by no means the only debugger available for non-Borland
executables, and not a significant general-purpose debugger.
The original Turbo Debugger was a stand-alone product introduced in 1989, along
with Turbo Assembler and the second version of Turbo C.
To use Turbo Debugger with source display, programs, or relevant parts of programs, had to
be compiled with TP or TC with a conditional directive set which added debugging
information to the compiled executable, which related source statements and corresponding
machine code. The debugger would then be started (TD did not debug within the
development IDE). After debugging the program would be recompiled without debugging
information to reduce its size.
The current version of Turbo Debugger came with several versions of the debugger program:
TD.EXE was the basic debugger; TD286.EXE ran in protected mode, and TD386.EXE was a
virtual debugger which used the TDH386.SYS device driver to communicate with TD.EXE.
The TDH386.SYS driver also added breakpoints supported in hardware by the 386 and later
processors to all three debugger programs. TD386 allowed some extra breakpoints that the
other debuggers did not (I/O access breaks, ranges greater than 16 bytes, and so on). There
was also a debugger for Windows 3 (TDW.EXE). Remote debugging is also supported in
Turbo debugger.

GNU Debugger
The GNU Debugger, usually called just GDB and named gdb as an executable file, is the
standard debugger for the GNU operating system. However, its use is not strictly limited to
the GNU operating system; it is a portable debugger that runs on many Unix-like systems and
works

for

many programming

languages,

Pascal, Fortran, Java and partially others.

Features

including Ada, C, C++, Objective-C, Free

GDB offers extensive facilities for tracing and altering the execution of computer programs.
The user can monitor and modify the values of programs' internal variables, and even
call functions independently of the program's normal behaviour.
GDB do supports a huge number of the computer processor. It do works easily on nearly any
machine.
GDB is still actively developed. As of version 7.0 new features include support for Python
scripting. Since version 7.0, support for "reversible debugging" allowing a debugging
session to step backward, much like rewinding a crashed program to see what happened is
available.
Remote debugging
GDB offers a 'remote' mode often used when debugging embedded systems. Remote
operation is when GDB runs on one machine and the program being debugged runs on
another. GDB can communicate to the remote 'stub' which understands GDB protocol via
Serial or TCP/IP. A stub program can be created by linking to the appropriate stub files
provided with GDB, which implement the target side of the communication
protocol. Alternatively, gdbserver can be used to remotely debug the program without
needing to change it in any way.
The same mode is also used by KGDB for debugging a running Linux kernel on the source
level with gdb. With KGDB, kernel developers can debug a kernel in much the same way as
they debug application programs. It makes it possible to place breakpoints in kernel code,
step through the code and observe variables. On architectures where hardware debugging
registers are available, watchpoints can be set which trigger breakpoints when specified
memory addresses are executed or accessed. KGDB requires an additional machine which is
connected to the machine to be debugged using a serial cable or ethernet. On FreeBSD, it is
also possible to debug using Firewire direct memory access (DMA).

The debugger does not contain its own graphical user interface, and defaults to a commandline interface. Several front-ends have been built for it, such as Xxgdb, Data Display
Debugger (DDD),Nemiver, KDbg, Xcode debugger, GDBtk/Insight and the HP Wildebeest
Debugger

GUI (WDB

Programming

GUI). IDEs such

as Codelite, Code::Blocks, Dev-C++, GNAT

Studio (GPS),

KDevelop, Qt

Creator, MonoDevelop, Eclipse, NetBeans and VisualStudio (see VS AddIn Gallery) can

interface with GDB. GNU Emacs has a "GUD mode" and several tools for VIMexist. These
offer facilities similar to debuggers found in IDEs.

Some other debugging tools have been designed to work with GDB, such as memory
leak detectors.

Examples of GNU commands


gdb program

debug "program" (from the shell)

run v

run the loaded program with the parameters

Bt

backtrace (in case the program crashed)

info registers

dump all registers

disass $pc-32, $pc+32 disassemble

Conclusion: Thus we have studied various debugging tools.

Вам также может понравиться