Вы находитесь на странице: 1из 9

A ONE-PASS PRETTYPRINTER

by

Anthony C. Hearn and Arthur C. Norman

University of Utah Cambridge University


Salt Lake City, Utah 84112 Cambridge, England

I. Introduction

The need for programs which print source text in a standard block structured
format is obvious. First, a program text which incorporates an indentation scheme
based on a hierarchy of blocks or sub-expressions is much easier to read, debug and
modify than a less structured format. Printing standards have in fact been set up
for languages such as Algol 60 so by using a formatter based on this standard, one
can hope to print programs in as exemplary a form as possible. Another motivation
for the automatic formatting of programs comes from the experience of large
software projects, where it is deemed important that all members of the programming
team adopt the same programming format for increased communication and efficiency
among team members. Thirdly, the availability of a cheap formatter allows one to
store program text in a form which eliminates many spurious characters, thus saving
possibly valuable storage space, since the formatter can always reproduce the
original form when necessary. Finally, it is important to remember that novice
programmers tend to follow the style of sample programs at their disposal, so a
good formatter can help in programmer education.

There are many formatting programs now available for a variety of programming
languages, but in most cases the documentation of these is limited to a description
in the programming manual or is present solely as source text in the system itself.
We shall summarize the most popular techniques in the next section before proposing
a new method for program formatting which we have not found elsewhere in the
literature and which is sufficiently different to bear documenting. We shall
illustrate our technique by application to two specific languages, namely Lisp and
Rlisp [I], an Algol-like higher level form of Lisp in which a large algebraic
manipulation program Reduce [2] is written. However, our techniques are general
enough so as to apply to any conventional programming language, and it would be a
simple matter to adapt our existing programs to produce formatted source for
languages such as Pascal, Algol 68 or PL/I.

While our discussion of techniques for reformatting programs is clearly


related to the problems of mathematical typesetting [3] and of document
preparation, we will not concern ourselves directly with those issues. We are only

Work supported in part by the National Science Foundation under Grant No.
MCS76-15035 and by the Burroughs Corporation.
-51-
One-Pass Prettypr inter

considering the specific problem of presenting program source in a form which is


input compatible or in a "documentation" style such as that defined for languages
like Pascal or Algol 68.

2. Existing Methods for Program Formatting

The type of formatting program we are concerned with is intended to produce an


elegant standardized output irrespective of the typing conventions that were used
when its input was originally prepared. It is therefore convenient to assume that
it will work on some compacted representation of this input where gross syntax has
been checked and, for instance, all syntactically redundant blank and other layout
characters have been removed. This compressed representation may be kept as a
file, say on disk, or it may in fact be a parse tree created as the original
program was read.

The aim of a formatting program is to produce output that displays the logical
structure of programs, and the major problem that it faces in doing this is
deciding where to insert line breaks in its output.

The existing methods for program formatting fall into five main classes. The
first class are purely concerned with rebuilding source files from compacted files,
and simply display the source text with each program statement printed on a single
line. This scheme is satisfactory for use with languages (perhaps such as Fortran)
that have a very strong statement structure, and can be part of a valid package
that economizes use of file storage. The major concern of this note is, however,
with languages that involve substantial nested constructions (such as Algol
blocks), and for these languages the simple formatting scheme is generally
inadequate.

The second class of programs use a fixed set of key words such as BEGIN, ELSE
etc for triggering indentation and line splitting. This method has little
computational overhead and can in fact lead to quite reasonable looking format.
However, such a scheme is rather sensitive to the style of program that it has to
process, and its output will tend to fluctuate between the generation of very long
output lines and the production of sparse zigzag displays of nested blocks. The
former can be a severe embarrassment if the output of the formatter is eventually
going to be used by a system that only accepts fixed length input records, or
indeed if it is to be printed on a finite width line printer. The latter, by
spreading programs over too many lines eventually leads to a lack of clarity. It
also wastes paper.

The third scheme tries to avoid these two extremes by using a small set of
keywords to force indentation, and by using some approximate measure of program
complexity to guide it in its treatment of the rest of the code [4]. It is easiest
to illustrate the action of such a formatter by considering the case of Lisp code.
If, at any stage in a recursive print process, the structure to be displayed looks
-52-
One-Pass Prettypr inter

'small' it will be printed on one line. Otherwise it will be printed on several


lines with some form of indentation to help emphasize its structure. There are two
favcred ways of indenting Lisp code:

(a) ( function argumentl


argument2
argument3)

(b) ( function
argumentl
argument2
argument3)

and with this method it is equally easy to implement either. A critical point in
the implementation of this scheme is the algorithm used to decide if a sub-list is
'small', and in effect the method that we propose in the next section is mainly
concerned with making such a decision. It is clear that simple implementations of
this style of formatter will still occasionally print lines that are longer that is
desirable, and that it will be easy for programs that involve deeply nested
constructs to cause confusion.

These problems are avoided in the fourth class of programs available which
make a prepass over the program tree, measuring the precise size of sub-trees in
the printed form as a means of determining where to break lines [5]. Since these
sizes are also needed by the actual printing program, one has the choice of whether
to recalculate them on the actual print pass, or store them with the tree during
the prepass. In either case, this can involve costs in either time or memory.
These costs are of course compensated for by that fact that the formatter can be
made extremely flexible in its choice of layout, and that its accurate knowledge of
the implications of all decisions it makes enables it to be programmed to deal
gracefully with all the awkward cases that defeat simpler schemes.

The final method which has been used is the picture compiler technique [6].
This is based on the idea of building a data structure that will represent the
entire printed form of the formatted program, and then passing over this structure
doing the actual output operations. This allows ultimate flexibility and control
over layout, and is needed for the display of mathematical formulas and the like,
it is however generally too powerful and too costly a tool for use on ordinary
programs although its use may become more attractive as memory and processor costs
decrease still further.

3. A New Method

The new method that we propose here will be described in terms of a pair of
coroutines. One of these will be responsible for producing a stream of characters
that represent the program being printed, the other makes decisions about how these
-53-
One-Pass Prettyprinter

characters should be displayed. These routines communicate via a FIFO buffer, and
it is the existence of this buffer that means that decisions about formatting can
be delayed until there is enough information available to make them reliably. Our
various implementations of formatters are written in languages that do not support
coroutines directly, and so in practice our programs achieve their effect through
use of a combination of ordinary recursion and the maintenance of explicit data
structures that keep track of the status of our two processes. It turns out that
the coroutine behavior we need to simulate is simple enough that this does not lead
to any great difficulty and indeed the entire mechanism could be described as one
monolithic p r o c e s s - we have nevertheless found that logical separation of the
printing and formatting actions makes our method much easier to understand and
code o

When active, the print process pushes characters into the FIFO buffer.
Whenever it reaches a point where it could be reasonable to break a line in the
output, instead of emitting a straightforward blank character it constructs a
special marker and places that in the buffer. The marker will contain enough
information for the formatting process to discover what level of indentation would
be appropriate to use were a line break to be inserted at that point. If, when the
printer has finished working on some self-contained block of code it finds that
none of these markers have been touched by the formatter, it overwrites them with
normal blanks thereby arranging that short bodies of code do not get split over
several lines. An ultimate formatter might want to delay decisions on how to print
a program until the entire program text had been processed by the printing routine.
We wish to avoid such cost, and so insist that the formatter lags behind the
printer by at most one printed line. This means that our FIFO buffer can never
contain more that a line-full of text, and so can be implemented efficiently as a
ring buffer. The formatting process, then, gets woken up whenever the buffer
becomes full. Its initial action is then to print characters until it comes to one
of the special separation markers left by the print routine. This marker indicates
that it should start a new line, and tells it how many blanks should be printed to
produce sensible indentation. When a marker is acted on in this way some form of
message must be passed back to the print process so that it does not overwrite
subsequent separators with blanks when it finally reaches the end of the block.

The features of our scheme that are novel are that it uses a limited amount of
workspace (the FIFO buffer) to keep track of the exact size of sub-expressions as
they are printed. The input text is analyzed just once, and the work of the
formatter involves just simple character manipulation - the cost of formatted
printing using this method will not be much greater than the cost of using the
print co-routine to display a non-formatted version of the input document. A
measurement on one of our implementations where major parts of the print process
are common to both a formatting and a direct printer gave a ratio of 2:1 for the
costs - much of this can be attributed to that fact that formatted output is spread
over twice as many lines as the dense non-formatted equivalent.
o54-
One-Pass Prettyprinter

The above description defines a simple, uniform layout for programs that
guarantees to print as much on each line as it can, subject to keeping within a
pre-specified page width and to adhering to given indentation conventions. There
remain a n~nber of adjustments to the program that are important if it is to be
generally acceptable.

The first adjustment that has to be made arises because in any given language
there are a number of specific constructs that call for special layouts. A
particularly striking case of this is blocks with labels in them, where the
approved style of display is probably:

BEGIN
statement;
LAB: statement;
statement
END;

with labels set out to the left of the rest of the program. In Lisp it is useful
to print programs with the shorthand n o t a t i o n ' s standing for (QUOTE s), and with
slightly special indentation rules for important constructs such as LAMBDA, PROG
and function definition. By defining various formats for the marker characters
emitted by the print process, and by making the formatting routine treat these all
carefully it is fairly easy to adjust our method to deal with most reasonable
special formatting requirements.

A more serious problem, and one that has given many previous formatting
programs difficulty, is the control of the layout of long lists. If an attempt is
made to declare, say, 30 variables at once, the declaration may easily get printed
on 30 lines as:

INTEGER II,
I2,
. o l

I30;

whereas a much more reasonable effect would have been for it to have appeared as:

INTEGER II,I2,I3,I4,I5,I6,
I7,I8 ....
.... I29,I30;

with the variables appearing in blocks. In practice this difficulty seems to arise
fairly commonly in Lisp, and slightly less frequently in other languages. Our
solution to this difficulty has been to adjust the formatting process so that when
it finds a marker character it can test if the buffer contains other markers that
correspond to the same level of program structure. If there are more than some
predefined number of markers stacked up, the buffer must contain the start of a
-55-
One-Pass Prettypr inter

l o n g thin list, and all but the last of the markers are printed as ordinary blanks,
a n d only the last one triggers the starting of a new line. With care this can lead
t o reasonable layouts even for long lists that have a few embedded large items.

Our buffering scheme can deal reasonably gracefully with programs that include
v e r y long strings or names. If the formatting process is entered when there are no
marker characters in the buffer it knows that there are no remaining natural places
to break the current line. It attempts to get round this by trimming leading
blanks from the line, thereby violating the normal indentation conventions. Only
i
when this fails to allow it to fit a whole item on a single line does it resort to
arbitrary splitting of items and the introduction of continuation markers.
i
One problem that we have not solved adequately is that of programs with a very
complex structure, where our formatter finds itself using so much indentation that
t h e r e is little or no room for program. When this problem becomes extreme the
formatter just resets its indentation level back toward the left margin in an ugly
b u t effective way of making room for subsequent code.

Provided they can be handled by the printing process our formatting scheme
c a n cope with comments as part of a program. Finding good rules to decide how they
should be positioned has caused us some trouble, mainly because of users who use
several consecutive lines of comments to draw themselves two-dimensional pictures.
S u c h block comments are not improved by being moved relative to each other!

The examples included as an appendix show the behavior both our Rlisp parser
and our Lisp one when given one of the routines that make up the Lisp formatter.
T h e Rlisp shown is the original form of the code. The Lisp displayed was obtained
from the parsed form of the Rlisp code by removing a few redundant
(parser-introduced) initialization statements and by re-inserting the comments.
iil

4. Current Implementations

While working on the ideas expressed in this note we have developed a number
~f formatting programs. One version is written as an integral part of the print
~ o u t i n e s in a Lisp system that is coded in BCPL [7]. A second, also formatting
Lisp, is implemented in Standard Lisp [8], a defined dialect of Lisp, and is
i n t e n d e d to be portable. The third is the version that formats Rlisp programs, and
t h a t is integrated into the Rlisp programming environment.

Each of these implementations uses a different set of protocols to simulate


the coroutine and message passing discipline that we use, and each differs in the
!details of how it copes with long lines and deeply indented structures - these
differences reflect both the practical difficulties that we encountered while
/developing the code and our slightly different ideas about what constitutes the
ideal format for a program.
-56-
One-Pass Prettyprinter

References

[I] Hearn, AoC., "Reduce 2 Symbolic Mode Primer", Utah Symbolic Computation Group
Operating Note No. 5 (October 1973).

[2] Hearn, A.C°, "Reduce 2 User's Manual", Utah Symbolic Computation Group Report
No. UCP-19 (March 1973).

[3] Kernighan, B.W. and Cherry, L.L., "A System for Typesetting Mathematics",
Comm. ACM 18 (1975) 151-157.

[4] Hueras, J. and Ledgard, H., "An Automatic Formatting Program for Pascal",
Sigplan Notices 12 (1977) 82-84.

[5] Goldstein, I, "Pretty-printing, Converting List to Linear Structure", A. I.


Memo No. 279, A. I. Lab, M.I.T., (February 1973), NTIS No. AD-773-927.

[6] Martin, W.A., "Symbolic Mathematical Laboratory", Project MAC, MIT, Report No.
MAC-TR-36 (January 1967).

[7] Fitch, J.P. and Norman, A.C., "Implementing Lisp in a High-Level Language",
Software- Practice and Experience 7 (1977) 713-725.

[8] Marti, J.B., Hearn, A.C., Griss, M.L. and Griss, C., "Standard Lisp Report",
Utah Symbolic Computation Group Report No. UCP-60 (January 1978).
m57-
One-Pass Prettypr inter

(a) Output Produced by the Rlisp Formatting Program

SYMBOLIC PROCEDURE SUPERPRINM(X,LMAR):


BEGIN SCALAR STACK,BUFFERI,BUFFERO,BN,INITIALBLANKS,RMAR,
PENDINGRPARS,INDENTLEVEL,INDBLANKS,RPARCOUNT,W:
BUFFERI := (BUFFERO := LIST NIL); %FIFO BUFFER;
INITIALBLANKS := 0;
RPARCOUNT := O;
INDBLANKS := O;
RMAR := LINELENGTH NIL - 3; %RIGHT MARGIN:
IF RMAR<25
THEN ERROR(O,
LIST(RMAR + 3,
"LINELENGTH TOO SHORT FOR SUPERPRINTING"));
BN := O; %CHARACTERS IN BUFFER:
INDENTLEVEL := O: %NO INDENTATION NEEDED, YET:
IF LMAR + 20>=RMAR THEN LMAR := RMAR - 21;
%NO ROOM FOR SPECIFIED MARGIN:
W := POSN();
IF W>LMAR THEN <<TERPRI(); W := 0>>:
IF W<LMAR THEN INITIALBLANKS := LMAR - W;
PRINDENT(X,LMAR + 3): %MAIN RECURSIVE PRINT ROUTINE;
OVERFLOW 'NONE; %FLUSH OUT THE BUFFER:
RETURN X
END;
-58-
One-Pass Prettyprinter

(b) Output Produced by the Lisp Formatting Program given Equivalent Input

(DE SUPERPRINM (X LMAR)


(PROG
(STACK BUFFERI BUFFERO BN INITIALBLANKS RMAR PENDINGRPARS
INDENTLEVEL INDBLANKS RPARCOUNT W)
(SETQ BUFFERI (SETQ BUFFERO (LIST NIL))) %C FIFO BUFFER
(SETQ INITIALBLANKS O)
(SETQ RPARCOUNT O)
(SETQ INDBLANKS O)
(SETQ RMAR (DIFFERENCE (LINELENGTH NIL) 3))
%C RIGHT MARGIN
(COND
((LESSP RMAR 25)
(ERROR
0
(LIST
(PLUS RMAR 3)
"LINELENGTH TOO SHORT FOR SUPERPRINTING"))))
(SETQ BN O) %C CHARACTERS IN BUFFER
(SETQ INDENTLEVEL O) %C NO INDENTATION NEEDED, YET
(COND
((NOT (LESSP (PLUS LMAR 20) RMAR))
(SETQ LMAR (DIFFERENCE RMAR 21))))
%C NO ROOM FOR SPECIFIED MARGIN
(SETQ W (POSN))
(COND
((GREATERP W LMAR) (PROGN (TERPRI) (SETQ W 0))))
(COND
((LESSP W LMAR)
(SETQ INITIALBLANKS (DIFFERENCE LMAR W))))
(PRINDENT X (PLUS LMAR 3))
%C MAIN RECURSIVE PRINT ROUTINE
(OVERFLOW 'NONE) %C FLUSH OUT THE BUFFER
(RETURN X)))

Вам также может понравиться