Вы находитесь на странице: 1из 43

The Common Gateway Interface (CGI)

1.1 What Is CGI?

As you traverse the vast frontier of the World Wide Web, you will come across documents that make you wonder, "How did they do this?" These documents could consist of, among other things, forms that ask for feedback or registration information, imagemaps that allow you to click on various parts of the image, counters that display the number of users that accessed the document, and utilities that allow you to search databases for particular information. In most cases, you'll find that these effects were achieved using the Common Gateway Interface, commonly known as CGI. One of the Internet's worst-kept secrets is that CGI is astoundingly simple. That is, it's trivial in design, and anyone with an iota of programming experience can write rudimentary scripts that work. It's only when your needs are more demanding that you have to master the more complex workings of the Web. In a way, CGI is easy the same way cooking is easy: anyone can toast a muffin or poach an egg. It's only when you want a Hollandaise sauce that things start to get complicated. CGI is the part of the Web server that can communicate with other programs running on the server. With CGI, the Web server can call up a program, while passing userspecific data to the program (such as what host the user is connecting from, or input the user has supplied using HTML form syntax). The program then processes that data and the server passes the program's response back to the Web browser. CGI isn't magic; it's just programming with some special types of input and a few strict rules on program output. Everything in between is just programming. Of course, there are special techniques that are particular to CGI, and that's what this book is mostly about. But underlying it all is the simple model shown in Figure 1.1.

Figure 1.1: Simple diagram of CGI

1.4 Internal Workings of CGI

So how does the whole interface work? Most servers expect CGI programs and scripts to reside in a special directory, usually called cgi-bin, and/or to have a certain file extension. (These configuration parameters are discussed in the Configuring the Server section in this chapter.) When a user opens a URL associated with a CGI program, the client sends a request to the server asking for the file. For the most part, the request for a CGI program looks the same as it does for all Web documents. The difference is that when a server recognizes that the address being requested is a CGI program, the server does not return the file contents verbatim. Instead, the server tries to execute the program. Here is what a sample client request might look like:
GET /cgi-bin/welcome.pl HTTP/1.0 Accept: www/source Accept: text/html Accept: image/gif User-Agent: Lynx/2.4 libwww/2.14 From: shishir@bu.edu

This GET request identifies the file to retrieve as /cgi-bin/welcome.pl. Since the server is configured to recognize all files inf the cgi-bin directory tree as CGI programs, it understands that it should execute the program instead of relaying it directly to the browser. The string HTTP/1.0 identifies the communication protocol to use. The client request also passes the data formats it can accept (www/source, text/html, and image/gif), identifies itself as a Lynx client, and sends user information. All this information is made available to the CGI program, along with additional information from the server.

The way that CGI programs get their input depends on the server and on the native operating system. On a UNIX system, CGI programs get their input from standard input (STDIN) and from UNIX environment variables. These variables store such information as the input search string (in the case of a form), the format of the input, the length of the input (in bytes), the remote host and user passing the input, and other client information. They also store the server name, the communication protocol, and the name of the software running the server. Once the CGI program starts running, it can either create and output a new document, or provide the URL to an existing one. On UNIX, programs send their output to standard output (STDOUT) as a data stream. The data stream consists of two parts. The first part is either a full or partial HTTP header that (at minimum) describes what format the returned data is in (e.g., HTML, plain text, GIF, etc.). A blank line signifies the end of the header section. The second part is the body, which contains the data conforming to the format type reflected in the header. The body is not modified or interpreted by the server in any way. A CGI program can choose to send the newly created data directly to the client or to send it indirectly through the server. If the output consists of a complete HTTP header, the data is sent directly to the client without server modification. (It's actually a little more complicated than this, as we will discuss in Chapter 3, Output from the Common Gateway Interface.) Or, as is usually the case, the output is sent to the server as a data stream. The server is then responsible for adding the complete header information and using the HTTP protocol to transfer the data to the client.
CGI Programming 101: CGI Programming With Apache and Perl on Windows XP

This page will show you how to install the Apache web server and Perl on your home computer. You'll then be able to write CGI programs and test them locally on your computer. Once Apache is installed and running, you'll be able to view your pages by pointing your web browser at the http://localhost/ address. You don't even need to be connected to the internet to view local pages and CGI programs, which can be quite useful if you want to work on programming while you're traveling or otherwise offline. These instructions have been tested on Windows XP. You should be able to install Apache and Perl on earlier versions of Windows, but on those systems you should definitely read the installation instructions that come with the software, since some things may need to be configured differently.

Who can see your website? Programming Locally, then Uploading to the ISP

Differences Between CGI Programs on Unix and Windows Installing Apache on Windows XP Installing ActivePerl on Windows XP Configuring Apache Viewing Your Site Writing Your CGI Programs Other Perl Editors Troubleshooting

Who can see your website?

If you have a permanent, fixed IP address for your computer (e.g. your computer is in an office, or you have your own T1 line), your Apache server will be able to serve pages to anyone in the world*. If you have a transient IP address (e.g. you use a dialup modem, DSL modem or cable modem to connect to the internet), you can give people your temporary IP address and they can access your page using the IP address instead of a host name (e.g, http://209.189.198.102/)*. But when you logout, your server will obviously not be connected, and when you dial in again you'll probably have a different IP address. Obviously for permanent web hosting, you should either get a fixed IP address (and your own domain name), or sign up with an ISP that can host your pages for you (like cgi101.com). * Unless you're behind a firewall, and the firewall is not configured to allow web traffic through.
Programming Locally, then Uploading to the ISP

You may want to develop and debug your programs on your own computer, then upload the final working versions to your ISP for permanent hosting. Nearly all of the programs shown in CGI Programming 101 will work seamlessly on Unix or Windows, but see below for a few differences.
Differences between CGI Programs on Unix and Windows

1. The "shebang" line. The first line of a Perl program (often called the shebang line) typically looks like this:
#!/usr/bin/perl

The actual location of Perl may be different from system to system (e.g. /bin/perl, /usr/local/bin/perl, etc.) For ActivePerl in Windows, this line should be changed to:
#!/perl/bin/perl

If you're programming locally and uploading to a remote ISP, you'll have to change this line each time.... unless your ISP was thoughtful enough to add a symlink to Perl in /perl/bin/perl. (We've done that on cgi101.com.) 2. Permissions. On XP you don't need to worry about file permissions. A CGI program is always executable, and your programs can always write to files to your directory. (Although, this isn't necessarily a good thing...) On Unix, permissions matter. Your CGI programs will need to be set with execute permissions. Any files you want to write to will need to be set with write permissions. CGI Programming 101 includes instructions on how to properly adjust file permissions for CGI programs in Unix. If you are writing your programs on XP and are not planning to upload them to a Unix server, you can simply disregard the permissions information.

Installing Apache on Windows XP

First go to http://httpd.apache.org/download.cgi and download Apache. Scroll down the page a bit until you find the one that says "best available version" (Apache 2.something). Then look for the "Win32 Binary (MSI Installer)". Download the binary .msi file to your computer (choose "open" rather than "save" so the installer will launch immediately).

The installer start screen.

Server Information - use localhost for both the Network Domain and the Server Name, unless you have a fixed IP address and your own domain name. Put your e-mail address for the Administrator's Email Address.

Setup Type - select "Typical"

Destination folder - the default is fine, C:\Program Files\Apache Group\

Finish the installation and quit the installer. At this point Apache is probably already running on your machine; go to http://localhost/ in your browser to view your start page.
To start/stop the Apache server, go to the Start menu and navigate to All Programs > Apache HTTP Server > Control Apache Server. There you can start, stop and restart Apache. You can also install the Apache taskbar icon via the "Monitor Apache Servers" option.

If you want to modify the homepage displayed by your Apache server, go to the Start menu and choose "My Computer", then navigate to Local Disk (C:) > Program Files > Apache Group > Apache 2. You'll see a folder containing items like this:

Open the htdocs folder and look for index.html. You can edit the file in Notepad or whatever HTML editor you like. For the programming examples in CGI Programming 101 we're going to create a separate folder in your "My Documents" area for CGI programs and HTML files. There's not really any need to modify the files here in htdocs unless you are setting up your own webserver and plan to host your own domain there.

Now you'll need to install Perl.

Installing ActivePerl on Windows XP

Installing Perl should be just as easy as installing Apache. Go to http://www.activestate.com/Products/ActivePerl/ and click on the download link to begin. Download the latest version of Perl available (which is 5.8.1 as of November 2003). Download the MSI file and open it.

The installer starts up.

On the Custom Setup screen, you can leave the setup as the default. This will install Perl, PPM (the Perl Package Manager) and programming examples to your hard drive in the location C:\Perl.

The "new featuers in PPM" screen talks about a PPM profile feature, but that requires ASPN (the full, commercial version) Perl, which you probably aren't installing right now. Leave the "Enabled PPM3 to send profile info to ASPN" unchecked.

Under Choose Setup Options, both "Add Perl to the PATH environment variable" and "Create Perl file extension association" should be checked.

The installer will finish up by installing HTML documentation. This step will take a while so be patient. When it's finished, your browser will launch and bring up the ActivePerl documentation:

Bookmark this page now (in your browser's favorites menu) so you can access it easily later.

Now Perl is installed. All you need to do now is modify the Apache server configuration.

Configuring Apache

First go to the Start menu and go to "My Documents". Make a new folder there called "My Website". This is where you're going to store your web pages and CGI programs. Next you need to modify the Apache configuration file to tell it where your pages are, and enable CGI programs. Go back to the Start menu and navigate to All Programs > Apache HTTP Server > Configure Apache Server > Edit the Apache httpd.conf Configuration file. The config file will be opened for you in Notepad. Scroll down (or use Find) until you get to the UserDir section of the file. It should have a line like this:
UserDir "My Documents/My Website"

Apache 2.2 doesn't have a UserDir section If you're using Apache 2.2, you'll have to ADD the UserDir line and the Directory section ( see below ). See http://httpd.apache.org/docs/2.2/mod/mod_userdir.html for more info on this. Scroll down just past that and you'll come to a commented section for Directory:
#<Directory "C:/Documents and Settings/*/My Documents/My Website"> # AllowOverride FileInfo AuthConfig Limit # Options MultiViews Indexes SymLinksIfOwnerMatch IncludesNoExec # <Limit GET POST OPTIONS PROPFIND> # Order allow,deny # Allow from all # </Limit> # <LimitExcept GET POST OPTIONS PROPFIND> # Order deny,allow # Deny from all # </LimitExcept> #</Directory>

Uncomment this entire section (by removing the pound signs at the beginning of each line), and change the Options line to this:
Options MultiViews Indexes SymLinksIfOwnerMatch Includes ExecCGI

Options specifies what options are available in this directory. The important ones here are Indexes, which enables server-side includes, and ExecCGI, which enables CGI programs in this directory. Scroll down a bit further to the DirectoryIndex line, and add index.cgi to the end of that line:
DirectoryIndex index.html index.html.var index.cgi

Now scroll down several pages (or use Find) to the AddHandler section. Uncomment the CGI line:
AddHandler cgi-script .cgi

This causes any file with a .cgi extension to be processed as a CGI program. If you want to also have files with a .pl extension be processed as CGI programs, add the .pl extension on that same line:
AddHandler cgi-script .cgi .pl

Next add this line immediately after:


AddHandler server-parsed .html

This causes all .html files to be searched for server-side include tags. Now save the configuration file, and restart Apache. Check http://localhost/ in your browser to ensure that the server restarted successfully. Trouble? If you get an error like the following:
Only one usage of each socket address (protocol/network address/port) is normally permitted. : make_sock: could not bind to address 0.0.0.0:80 no listening sockets available, shutting down

This probably means you're already running another web server (such as IIS) on your machine. You'll need to remove IIS in order to run Apache. See the following Microsoft document on How to Remove IIS.
Viewing Your Site

http://localhost/ is the homepage for your site; it shows the index.html page located in the htdocs folder.

To view the pages in your "My Website" folder, the actual URL is http://localhost/~my username/. For example, on my computer, my username is "Jackie Hamilton", so the URL to my pages is http://localhost/~Jackie Hamilton/. (If you don't know your username, open the Start menu; your username is at the top of the Start box.) In your browser, go ahead and type in the URL to your web page. If you remembered to create the "My Website" folder earlier, you should now see an empty directory listing. Bookmark the page so you don't have to type in the long URL any more.

Writing Your CGI Programs

Now you're ready to write some CGI programs! Here's a simple one you can use to get started. You can write this in Notepad:
#!/perl/bin/perl -wT print "Content-type: text/html\n\n"; print "<h2>Hello, World!</h2>\n";

Unfortunately Notepad has a nasty habit of appending .txt to the end of all text files, so when you go to save this file, change the "Save as Type" from "Text Documents" to "All Files". Then put "first.cgi" as the file name. Save it in your My Website folder, then reload your web page in your browser. You should see first.cgi listed there; click on it to view your first CGI program! Now go to Chapter 1 to start learning CGI programming.
Other Perl Editors

You can get by just fine by writing all of your CGI programs in Notepad. But you might find it more helpful to use a proper Perl editor for writing code. ActiveState (the generous folks who provide ActivePerl for free) also sells Visual Perl, a Perl plug-in for Visual Studio .NET. EditPlus is a shareware ($30) text/HTML/programming editor with syntax highlighting for various languages. The DzSoft Perl Editor offers syntax coloring (and checking), a builtin "run" option that you can use to test your scripts (and view error messages), a file template for new files, quick-insert shortcuts, and other useful tools. This program is shareware ($49) but a demo is available for download and evaluation.

OptiPerl is a visual developing environment and editor for creating, testing, debugging and running perl scripts. A free trial download is available, and if you decide to keep it, the standard license is $39. Perl Editor by EngInSite is an integrated development environment for creating, testing and debugging Perl scripts.

Troubleshooting

If you get an error like this when you try to start Apache:
Only one usage of each socket address <protocol/network address/port> is normally petmitted. :make_sock could not bound to address:0.0.0.0:80 no listening sockets available , shutting down Unable to open logs

This probably means you already have another web server program (like IIS) running. You'll need to turn the other one off before you can start Apache. To disable this, go to the Control Panel->Administrative Tools->Services, and look for the IIS service. Right-click to stop the service.

Getting Started with CGI Programming in C


Content

Why CGI programming? A basic example Analysis of the example So what is CGI programming? Using a C program as a CGI script The Hello world test How to process a simple form Using METHOD="POST" Further reading

This is an introduction to writing CGI programs in the C language. The reader is assumed to know the basics of C as well how to write simple forms in HTML and to be able to install CGI scripts on a Web server. The principles are illustrated with very simple examples. Two important warnings:

To avoid wasting your time, please checkfrom applicable local doc u ments or by contacting local webmasterwhether you can install and run CGI scripts written in C on the server. At the same time, please check how to do that in detailspecifically, where you need to put your CGI scripts. This document was written to illustrate the idea of CGI scripting to C program mers. In practice, CGI programs are usually written in other lan guages, such as Perl, and for good reasons: except for very simple cases, CGI programming in C is clumsy and error-prone.

Why CGI programming?


As my document How to write HTML forms briefly explains, you need a server side-script in order to use HTML forms reliably. Typically, there are simple server-side scripts available for simple, common ways of processing form submissions, such as sending the data in text format by E-mail to a specified address. However, for more advanced processing, such as collecting data into a file or database, or retrieving information and sending it back, or doing some calculations with the submitted data, you will probably need to write a server-side script of your own. CGI is simply an interface between HTML forms and server-side scripts. It is not the only possibilitysee the excellent tutorial How the web works: HTTP and CGI explained by Lars Marius Garshol for both an introduction to the concepts of CGI and notes on other pos si bil ities. If someone suggests using JavaScript as an alternative to CGI, ask him to read my JavaScript and HTML: possibilities and caveats. Briefly, JavaScript is inherently unreliable at least if not backed up with server-side scripting.

A basic example
The above-mentioned How the web works: HTTP and CGI explained is a great tutorial. The following introduction of mine is just another attempt to present the basics; please consult other sources if you get confused or need more information. Let us consider the following simple HTML form: <form action="http://www.cs.tut.fi/cgi-bin/run/~jkorpela/mult.cgi"> <div><label>Multiplicand 1: <input name="m" size="5"></label></div> <div><label>Multiplicand 2: <input name="n" size="5"></label></div> <div><input type="submit" value="Multiply!"></div> </form> It will look like the following on your current browser:
Multiplicand 1:

Multiplicand 2:
Multiply!

You can try it if you like. Just in case the server used isnt running and accessible when you try it, heres what you would get as the result:
Multiplication results

The product of 4 and 9 is 36.

Analysis of the example


We will now analyze how the example above works. Assume that you type 4 into one input field and 9 into another and then invoke sub mis siontypically, by clicking on a submit button. Your browser will send, by the HTTP protocol, a request to the server at www.cs.tut.fi. The browser pick up this server name from the value of ACTION attribute where it occurs as the host name part of a URL. (Quite often, theACTION attribute refers, often using a relative URL, to a script on the same server as the document resides on, but this is not necessary, as this example shows.) When sending the request, the browser provides additional information, specifying a relative URL, in this case
/cgi-bin/run/~jkorpela/mult.cgi?m=4&n=9

This was constructed from that part of the ACTION value that follows the host name, by appending a question mark ? and the form data in a specifically encoded format. The server to which the request was sent (in this case, www.cs.tut.fi) will then process it according to its own rules. Typically, the servers configuration defines how the relative URLs are mapped to file names and which directories/folders are interpreted as containing CGI scripts. As you may guess, the part cgi-bin/ in the URL causes such interpretation in this case. This means that instead of just picking up and sending back (to the browser that sent the request) an HTML document or some other file, the server invokes a script or a program specified in the URL (mult.cgi in this case) and passes some data to it (the datam=4&n=9 in this case).
It depends on the server how this really happens. In this particular case, the server actually runs the (executable) program in the file mult.cgi in the subdirectory cgi-bin of user jkorpelas home directory. It could be something quite different, depending on server configuration.

So what is CGI programming?


The often-mystified abbreviation CGI, for Common Gateway Interface, refers just to a convention on how the invocation and parameter passing takes place in detail.

Invocation means different things in different cases. For a Perl script, the server would invoke a Perl interpreter and make it execute the script in an interpretive manner. For an executable program, which has typically been produced by a compiler and a loader from a source program in a language like C, it would just be started as a separate process. Although the word script typically suggests that the code is interpreted, the term CGI scriptrefers both to such scripts and to executable programs. See the answer to question Is it a script or a program? in CGI Programming FAQ by Nick Kew.

Using a C program as a CGI script


In order to set up a C program as a CGI script, it needs to be turned into a binary executable program. This is often problematic, since people largely work on Windows whereas servers often run some version of UNIX or Linux. The system where you develop your program and the server where it should be installed as a CGI script may have quite different architectures, so that the same executable does not run on both of them. This may create an unsolvable problem. If you are not allowed to log on the server and you cannot use a binary-compatible system (or a cross-compiler) either, you are out of luck. Many servers, however, allow you log on and use the server in interactive mode, as a shell user, and contain a C compiler.

You need to compile and load your C program on the server (or, in principle, on a system with the same architecture, so that binaries produced for it are executable on the server too).
Normally, you would proceed as follows:
1. Compile and test the C program in normal interactive use. 2. Make any changes that might be needed for use as a CGI script. The program should read its input according to the intended form sub mis sion method. Using the default GETmethod, the input is to be read from the environment variable. QUERY_STRING. (The program may also read data from filesbut these must then reside on the server.) It should generate output on the standard output stream (stdout) so that it starts with suitable HTTP headers. Often, the output is in HTML format. 3. Compile and test again. In this testing phase, you might set the environment variableQUERY_STRING so that it contains the test data as it will be sent as form data. E.g., if you intend to use a form where a field named foo contains the input data, you can give the command setenv QUERY_STRING "foo=42" (when using the tcsh shell) or QUERY_STRING="foo=42" (when using the bash shell). 4. Check that the compiled version is in a format that works on the server. This may require a recompilation. You may need to log on into the server computer (using Telnet, SSH, or some other terminal emulator) so that you can use a compiler there.

5. Upload the compiled and loaded program, i.e. the executable binary program (and any data files needed) on the server. 6. Set up a simple HTML document that contains a form for testing the script, etc.

You need to put the executable into a suitable directory and name it according to serverspecific conventions. Even the compilation commands needed here might differ from what you are used to on your workstation. For example, if the server runs some flavor of Unix and has the Gnu C compiler available, you would typically use a compilation command likegcc -o mult.cgi mult.c and then move (mv) mult.cgi to a directory with a name likecgi-bin. Instead of gcc, you might need to use cc. You really need to check local instructions for such issues. The filename extension .cgi has no fixed meaning in general. However, there can beserver-dependent (and operating system dependent) rules for naming executable files.Typical extensions for executables are .cgi and .exe.

The Hello world test


As usual when starting work with some new programming technology, you should probably first make a trivial program work. This avoids fighting with many potential problems at a time and concentrating first on the issues specific to the environment, here CGI. You could use the following program that just prints Hello world but preceded by HTTP headers as required by the CGI interface. Here the header specifies that the data is plain ASCII text. #include <stdio.h> int main(void) { printf("Content-Type: text/plain;charset=us-ascii\n\n"); printf("Hello world\n\n"); return 0; } After compiling, loading, and uploading, you should be able to test the script simply by entering the URL in the browsers address bar. You could also make it the destination of a normal link in an HTML document. The URL of course depends on how you set things up; the URL for my installed Hello world script is the following: http://www.cs.tut.fi/cgi-bin/run/~jkorpela/hellow.cgi

How to process a simple form


For forms that use METHOD="GET" (as our simple example above uses, since this is the default), CGI specifications say that the data is passed to the script or program in an environment variable called QUERY_STRING.

It depends on the scripting or programming language used how a program can access the value of an environment variable. In the C language, you would use the library functiongetenv (defined in the standard library stdlib) to access the value as a string. You might then use various techniques to pick up data from the string, convert parts of it to numeric values, etc. The output from the script or program to primary output stream (such as stdin in the C language) is handled in a special way. Effectively, it is directed so that it gets sent back to the browser. Thus, by writing a C program that it writes an HTML document onto its standard output, you will make that document appear on users screen as a response to the form submission. In this case, the source program in C is the following: #include <stdio.h> #include <stdlib.h> int main(void) { char *data; long m,n; printf("%s%c%c\n", "Content-Type:text/html;charset=iso-8859-1",13,10); printf("<TITLE>Multiplication results</TITLE>\n"); printf("<H3>Multiplication results</H3>\n"); data = getenv("QUERY_STRING"); if(data == NULL) printf("<P>Error! Error in passing data from form to script."); else if(sscanf(data,"m=%ld&n=%ld",&m,&n)!=2) printf("<P>Error! Invalid data. Data must be numeric."); else printf("<P>The product of %ld and %ld is %ld.",m,n,m*n); return 0; }
As a disciplined programmer, you have probably noticed that the program makes no check against integer overflow, so it will return bogus results for very large operands. In real life, such checks would be needed, but such considerations would take us too far from our topic.

Note: The first printf function call prints out data that will be sent by the server as an HTTP header. This is required for several reasons, including the fact that a CGI script can send any data (such as an image or a plain text file) to the browser, not just HTML documents. For HTML documents, you can just use the printf function call above as such; however, if your character encoding is different from ISO 8859-1 (ISO Latin 1), which is the most common on the Web, you need to replace iso-8859-1 by the registered name of the encoding (charset) you use.

I have compiled this program and saved the executable program under the name mult.cgiin my directory for CGI scripts at www.cs.tut.fi. This implies that any form with action="http://www.cs.tut.fi/cgi-bin/run/~jkorpela/mult.cgi" will, when submitted, be processed by that program.

Consequently, anyone could write a form of his own with the same ACTIONattribute and pass whatever data he likes to my program. Therefore, the program needs to be able to handle any data. Generally, you need to check the data before starting to process it.

Using METHOD="POST"
The idea of METHOD="POST"

Let us consider next a different processing for form data. Assume that we wish to write a form that takes a line of text as input so that the form data is sent to a CGI script that appends the data to a text file on the server. (That text file could be readable by the author of the form and the script only, or it could be made readable to the world through another script.) It might seem that the problem is similar to the example considered above; one would just need a different form and a different script (program). In fact, there is a difference. The example above can be regarded as a pure query that does not change the state of the world. In particular, it is idempotent, i.e. the same form data could be submitted as many times as you like without causing any problems (except minor waste of resources). However, our current task needs to cause such changesa change in the content of a file that is intended to be more or less permanent. Therefore, one should use METHOD="POST". This is explained in more detail in the document Methods GET and POST in HTML forms - whats the difference? Here we will take it for granted that METHOD="POST" needs to be used and we will consider the technical implications.

For forms that use METHOD="POST", CGI specifications say that the data is passed to the script or program in the standard input stream (stdin), and the length (in bytes, i.e. characters) of the data is passed in an environment variable calledCONTENT_LENGTH.
Reading input

Reading from standard input sounds probably simpler than reading from an environment variable, but there are complications. The server is not required to pass the data so that when the CGI script tries to read more data than there is, it would get an end of file indi cation! That is, if you read e.g. using the getchar function in a C program, it is undefined what happens after reading all the data characters; it is not guaranteed that the function will return EOF.

When reading the input, the program must not try to read more thanCONTENT_LENGTH characters.
Sample program: accept and append data

A relatively simple C program for accepting input via CGI and METHOD="POST" is the following: #include <stdio.h> #include <stdlib.h> #define MAXLEN 80 #define EXTRA 5 /* 4 for field name "data", 1 for "=" */ #define MAXINPUT MAXLEN+EXTRA+2 /* 1 for added line break, 1 for trailing NUL */ #define DATAFILE "../data/data.txt" void unencode(char *src, char *last, char *dest) { for(; src != last; src++, dest++) if(*src == '+') *dest = ' '; else if(*src == '%') { int code; if(sscanf(src+1, "%2x", &code) != 1) code = '?'; *dest = code; src +=2; } else *dest = *src; *dest = '\n'; *++dest = '\0'; } int main(void) { char *lenstr; char input[MAXINPUT], data[MAXINPUT]; long len; printf("%s%c%c\n", "Content-Type:text/html;charset=iso-8859-1",13,10); printf("<TITLE>Response</TITLE>\n"); lenstr = getenv("CONTENT_LENGTH"); if(lenstr == NULL || sscanf(lenstr,"%ld",&len)!=1 || len > MAXLEN) printf("<P>Error in invocation - wrong FORM probably."); else { FILE *f;

fgets(input, len+1, stdin); unencode(input+EXTRA, input+len, data); f = fopen(DATAFILE, "a"); if(f == NULL) printf("<P>Sorry, cannot store your data."); else fputs(data, f); fclose(f); printf("<P>Thank you! Your contribution has been stored."); } return 0; } Essentially, the program retrieves the information about the number of characters in the input from value of the CONTENT_LENGTH environment variable. Then it unencodes (decodes) the data, since the data arrives in the specifically encoded format that was already men tioned. The program has been written for a form where the text input field has the name data (actually, just the length of the name matters here). For example, if the user types
Hello there!

then the data will be passed to the program encoded as


data=Hello+there%21

(with space encoded as + and exclamation mark encoded as %21). The unencode routine in the program converts this back to the original format. After that, the data is appended to a file (with a fixed file name), as well as echoed back to the user. Having compiled the program I have saved it as collect.cgi into the directory for CGI scripts. Now a form like the following can be used for data submissions: <FORM ACTION="http://www.cs.tut.fi/cgi-bin/run/~jkorpela/collect.cgi" METHOD="POST"> <DIV>Your input (80 chars max.):<BR> <INPUT NAME="data" SIZE="60" MAXLENGTH="80"><BR> <INPUT TYPE="SUBMIT" VALUE="Send"></DIV> </FORM>
Sample program: view data stored on a file

Finally, we can write a simple program for viewing the data; it only needs to copy the content of a given text file onto standard output: #include <stdio.h> #include <stdlib.h> #define DATAFILE "../data/data.txt" int main(void) { FILE *f = fopen(DATAFILE,"r"); int ch;

if(f == NULL) { printf("%s%c%c\n", "Content-Type:text/html;charset=iso-8859-1",13,10); printf("<TITLE>Failure</TITLE>\n"); printf("<P><EM>Unable to open data file, sorry!</EM>"); } else { printf("%s%c%c\n", "Content-Type:text/plain;charset=iso-8859-1",13,10); while((ch=getc(f)) != EOF) putchar(ch); fclose(f); } return 0; } Notice that this program prints (when successful) the data as plain text, preceded by a header that says this, i.e. has text/plain instead of text/html. A form that invokes that program can be very simple, since no input data is needed: <form action="http://www.cs.tut.fi/cgibin/run/~jkorpela/viewdata.cgi"> <div><input type="submit" value="View"></div> </form> Finally, heres what the two forms look like. You can now test them:
Form for submitting data

Please notice that anything you submit here will become visible to the world:
Your input (80 chars max.):

Send

Form for checking submitted data

The content of the text file to which the submissions are stored will be displayed as plain text.
View

Even though the output is declared to be plain text, Internet Explorer may interpret it partly as containing HTML markup. Thus, if someone enters data that contains such markup, strange things would happen. The viewdata.c program takes this into account by writing the NUL character ('\0') after each occurrence of the greater-than character lt;, so that it will not be taken (even by IE) as starting a tag.

Further reading
You may now wish to read The CGI specification, which tells you all the basic details about CGI. The next step is probably to see what the CGI Programming FAQ contains. Beware that it is relatively old. There is a lot of material, including introductions and tutorials, in the CGI Resource Index. Notice in particular the section Programs and Scripts: C and C++: Libraries and Classes, which contains libraries that can make it easier to process form data. It can be instructive to parse simple data format by using code of your own, as was done in the simple examples above, but in practical application a library routine might be better. The C language was originally designed for an environment where only ASCII characters were used. Nowadays, it can be usedwith cautionfor processing 8-bit characters. There are various ways to overcome the limitation that in C implementations, a character is generally an 8-bit quantity. See especially the last section in my book Unicode Explained. Basic CGI Programming
Written by Valerie Mates, May 18, 1999

CGI programs generate web pages on the fly. When you type text in boxes on a web page and press a button to submit the data, you are running a CGI program. This page describes how to write a CGI program.
The Basics At its most basic, a CGI program is one that reads an environment variable and writes out ordinary HTML. For example, here is a simple shell script CGI program:
#!/bin/sh echo "Content-type: text/html" echo "" echo "Hello world!"

Ideally, the HTML from that script would include tags like <html> and <head> and <body>, but browsers will know what to do with it even if those are missing.
What Programming Language? A CGI program can be written in any programming language. This page talks about CGI programming in Perl.

Where do I put the program? CGI programs need to go in a "cgi-bin" directory. This is a special directory of programs that can be run by the web server. Unfortunately there is no standard location for a cgi-bin directory. To find yours, check your web server configuration or ask your system administrator. Forms You can send data to a CGI program from a form, either on a web page or from another CGI program. The HTML for a simple form might look like this:
<form action="/cgi-bin/foo.cgi" method=post> Greeting: <input type="text" name="greeting" size=10 maxlength=20><br> Your Name: <input type="text" name="your_name" size=20 maxlength=30><br> <input type="submit" name="submit" value="Send"> </form>

In a web browser window, that HTML will produce this:


Greeting: Your Name:
Send

It will run a program called foo.cgi in your cgi-bin directory.


cgi-lib.pl For CGI programming, I use a library called cgi-lib.pl. It is available from cgilib.stanford.edu/cgi-lib/

To use cgi-lib in a Perl program, put it in the same directory as your program (or in your Perl search path) and include the line:
require("cgi-lib.pl");

cgi-lib has several very useful routines. One is called ReadParse.


Reading Input From A Form If you include this line in your Perl program:
&ReadParse;

then all the variables on the form are put into a hash named %in. In your program you can refer to the variables like this $in{'greeting'} and $in{'your_name'} that is $in{'name_of_variable'}.
HTML Headers The first thing a CGI program must do, before displaying any text, is to tell the browser that the program will be sending text. One way to do this is to print the string: Content-type: text/html followed by two newlines. The other option is to use a cgi-lib function called PrintHeader, which you do by including this line in your program:
print &PrintHeader;

If you leave out the HTML headers, you will get a web server error.
Using the Variables Here is a sample Perl program that uses the variables from the form:
#!/usr/local/bin/perl require("cgi-lib.pl"); &ReadParse; print &PrintHeader; print <<EOF; <html> <head> <title>A Greeting From $in{'your_name'}</title> </head> <body bgcolor="#FFFFFF"> $in{'your_name'} sends you this greeting:<br> <blockquote>$in{'greeting'}</blockquote> </body> </html> EOF

When someone runs that program, its output will look something like this:
Jane Smith sends you this greeting: Hello, isn't this weather great?

Handling Errors Normally in a Perl program, if an error condition occurs, you would use the Perl commanddie to display an error message and exit. However, you cannot do this in a CGI program. If you do this in a CGI program, the error message

will be hidden away in a web server error log where the user cannot see it. The user will see only an error message that says something like "500 Server error". Instead of die, a CGI program should display an intelligent error message and then callexit. I wrote a routine called "crash" that I use. Here is the code for it:
# # Subroutine to exit gracefully from errors: # sub crash{ print $_[0]; print "</td></tr></table></td></tr></table> </body></html>"; exit; }

The end-of-table code in the crash routine is useful because if you print a table without a </table> tag, the browser won't show anything in the table. The extra tags make sure that even if the crash occurs while you are in the middle of writing out a table, the error message will still be readable. Here is an example of a program that calls crash:
#!/usr/local/bin/perl require("cgi-lib.pl"); &ReadParse; print &PrintHeader; # If greeting is blank, display error message and exit: if ($in{'greeting'} eq "") { crash("Please enter a greeting. Press your browser's Back button to enter it."); } print <<EOF; <html> <head> <title>A Greeting From $in{'your_name'}</title> </head> <body bgcolor="#FFFFFF"> $in{'your_name'} sends you this greeting:<br> <blockquote>$in{'greeting'}</blockquote> </body> </html> EOF # # Subroutine to exit gracefully from errors: #

sub crash{ print $_[0]; print "</td></tr></table></td></tr></table> </body></html>"; exit; }

Debugging Tips Some techniques that are useful for debugging CGI programs;
1.

Use print statements. That is, if you want to know what the variable $in{'foo'} is set to, add a line that says:
print "$in{'foo'}<br>\n";

2. You can run CGI programs from the command line! That is, if the program foo.cgikeeps giving you errors when you run it from the web server, telnet to the server, change directory to your cgi-bin directory, and, at the command line, run your program type ./foo.cgi. Perl is good about giving meaningful error messages. 3. Use the "tail" command to look at your web server error log. When your browser is giving you meaningless "500 error" messages, your web server's error log is likely to have a much more useful error message. 4. The error message Premature end of script headers may mean that your program wrote out some other text before the HTML headers.
Security Question: What is wrong with this line of code?
system("log_to_database $in{'user_data'}");

Answer: This program runs a Unix command with user-supplied data. That is, it runs the command:
log_to_database something

where something could be anything at all. Suppose the user had entered this text: ; rm /. Then the Unix command that would be run is log_to_database ; rm /. That is, by adding a semicolon, the user terminated the log_to_database command and started a second command on the same command line. That second command in this case is (a mild version of) the command to delete all the files on the system. Since you don't want users running random commands on your system, be very careful what you do with user-supplied data. Avoid passing user-supplied data to system commands. If you must do so, first filter out all possible bad

characters from the data, or, better yet, to avoid missing any special characters you haven't thought of, filter out all characters exceptthe ones that are acceptable. For example, the command:
$in{'user_data'} =~ s/[^A-Z0-9]//gio;

will remove all non-alphanumeric characters from the variable $in{'user_data'}. If the user enters ; rm / and you run that substitution, the user's entry will be pared down to only its alphanumeric characters, which in this case are the letters "rm" without the dangerous semicolon. Now you can safely run the command as log_to_database rm, which will merely log the letters "rm" to the database which is vastly preferable to deleting all the files on your system! Be careful too about filenames. If the user enters a filename, beware allowing a carefully placed .. or other special characters to overwrite a file in some other directory from the one where you intended the data to be stored.
General The Common Gateway Interface (CGI) is a standard for interfacing external applications with Web servers. Unlike a plain HTML document which returns only static information, a CGI program, on the other hand, is executed in real-time, so that it can output dynamic information. 1. Instructions to setup CGI programs 2. Setup authenticated CGI programs

Instructions to setup CGI programs 1. Before you can write your own CGI program, you will need to have an account on the Teaching Web Server. If you haven't got an account on the Teaching Web Server yet, please refer to our web page on "Who can apply" for more information. 2. You have to place your CGI programs in a directory called cgi-bin in your public_html directory. Use the following command to create the "cgi-bin" directory first:
mkdir $HOME/public_html/cgi-bin

3. From now on, you can place your CGI programs under the directory $HOME/public_html/cgibin and the URL to access your CGI program is :
http://teaching.ust.hk/cgibin/cgiwrap/~course_code/CGI_program_name

4. Examples
o

The URL for course comp123 with the program "testprog.cgi" is

http://teaching.ust.hk/cgi-bin/cgiwrap/~comp123/testprog.cgi

or
http://teaching.ust.hk/cgi-bin/cgiwrap/comp123/testprog.cgi o

To execute the same script but get debugging/unformated output:


http://teaching.ust.hk/cgi-bin/cgiwrapd/comp123/testprog.cgi

To query the imagemap "mymap.map" in the public_html directory of the course "comp123":
http://teaching.ust.hk/cgi-bin/imagemap/~comp123/mymap.map

or
http://teaching.ust.hk/~comp123/mymap.map (for image map only and file extension must be .map)

5. Note to C programmers If you are writing your CGI programs in C, it may happen that when running your program, the dynamic linker would warn you that the library you are using is older than expected. As this warning will usually be given out as first line of output, this makes the CGI program does not work as expected. The solution to this is to recompile your C program in Solaris OS before you run it on Web as our Web Server is running Solaris 2.6 now.

Setup authenticated CGI programs 1. First read Instructions to setup CGI programs in order to understand basic CGI program setup procedure. 2. Because cgiwrap does not support .htaccess placed underneath in your $HOME/public_html/cgi-bin, the web server does not read and follow any commands in .htaccess right there. 3. To execute your CGI program with authentication, you should use the following URL instead:
http://teaching.ust.hk/cgibin/auth/cgiwrap/course_code/CGI_program_name

or
http://teaching.ust.hk/cgibin/auth/cgiwrap/~course_code/CGI_program_name

4. We have put a generic .htaccess (see below) in URI /cgi-bin/auth/cgiwrap , the web server will request authentication when accessing the above URL. After entering correct ITSC account and password, the CGI environment variables e.g. REMOTE_USER and REMOTE_HOST will pass along to your CGI program.
AuthName HKUST AuthType Basic <Limit GET POST> require valid-user </Limit>

5. It is your task to check who are authorized to execute the CGI program based on the variable REMOTE_USER. 6. This setup differs from general web authentication as because we use cgiwarp for user CGI programs execution. In traditional web authentication, both authentication (who you are) and authorization (are you right to do that?) are handled or defined by web server and .htaccess file. However, authentication is still handled by web server but authorization will be your responsibility in CGI program. 7. Under this setup, please note that you are not possible to define your own username and password pair for authentication.

CGI Programming Is Simple! Thats a bold claim, isnt it.


I bet you are no stranger to computer programming. Why else would you be reading this? You know that nothing about computers is simple! And thats precisely why I can make such a bold claim. Why? Because, as long as you know how to write any kind of computer programs, you already know everything you need to know about writing CGI programs. Yes, everything!

What is CGI?
There seems to be a lot of confusion even among experienced programmers about it. Myths abound. I am sure you have heard at least some of them. Which is why I would like to tell you first what CGI is

not.

It is not a programming language. That means, for example: o You do not have to learn Perl o You can use the languages you already know o You can use any language as long as it can read input can write output And what computer language cant? For that matter, you do not need to use a language. It is not a programming style. You can use your own. It is not cryptic. Perl is cryptic, all right, but see above: You dont need to use Perl. It is not for Unix gurus only. In fact, you dont have to be any kind of guru. All you need is to know how to program. And you already know that!
o

NOTE: Please dont misconstrue me. I have nothing against Perl. But from browsing the web you may get the impression you must learn it. All Im saying is that you dont have to. But if you want to, be my guest. ANOTHER NOTE: If you dont know anything about programming, you need to learn that first. But you can still continue reading.

All right, already! What is it?


Quite simply, CGI stands for Common Gateway Interface. Thats a fancy term for something we all know as Application Programming Interface. So, CGI is the API for the web server. The web server, of course, is the software that sends web pages to web browsers. Technically, web browsers should be called web clients, and people who use them should be called web browsers. But no need to get technical here.

What does a server do?


Essentially, it waits. Unless the site is very busy, of course. What does it wait for? For a client, I mean browser, to ask for a file. The file can be an HTML document, or a graphic, or just about any kind of file. Once the server receives the request, it does three things:

It sends a line of plain text which explains what kind of file is being sent, i.e. HTML, or GIF, or whatever else. It sends out a blank line. It sends out the contents of the file.

In that order.

How many files does it send?


One. Now you may be shaking your head. After all, a typical web page consists of an HTML document and some graphics, each of them residing in a different file. That, of course, is true. Nevertheless, during a single session, a web server sends out only one file. The browser must start a new session for each and every file it needs to get. And since all servers and most browsers are perfectly capable of multitasking, they can have several sessions running at the same time. But they do need a separate session for every file.

Does it have to be a file?


Not necessarily. All that is transferred is data. Remember: The server and the client (the browser) usually run on different computers. They may run under different operating systems, even with different microprocessors. The browser really asks for a resource and does not know, or care, where

the server gets the data from. Nevertheless, a typical server is programmed to get its data from a file. It simply reads the data from the file and sends it to the client during the last of the three steps I talked about before. As a result of this process, the server only sends static data. That is to say, the server does not dynamically modify the data.

But I want to send dynamic data!


And you can! Quite easily at that. This is precisely what CGI was designed for. You simply write a program that produces data dynamically. Your data then goes to the browser instead of a file. That way your CGI program effectively extends the functionality of the server, just as, for example, a DLL extends the functionality of Windows. Except, CGI is much simpler than anything you would write for Windows.

But, how do I talk to the browser?


You dont. The server handles that for you. In fact, the beauty of it is that you do not even have to talk to the server. All you do is write to standard output. So, for example, in C you could use printf(). The only thing you have to take care of is using all three steps I talked about before. Since the server does not know what kind of data you are outputing, you need to write that information to standard output yourself. Do you remember I said you could even do it without the use of a programming language? Lets suppose, for example, that your server is running under MS DOS. Well, none of them do, but there are Windows servers, and Windows can handle MS DOS commands.

Now, let us say you would like to send the listing of your current directory to the web (not a good idea, but it shows just how simple it is). Well, MS DOS has the dir command that sends the directory listing to standard output.

What about the first two steps?


True, dir will not explain it is sending plain text before sending it. But never fear! You can write a batch file like this:

echo Content-type: text/plain echo. dir

The first line of this batch file tells the browser to expect plain text. The second sends a blank line. The third lists the contents of the current directory. A disclaimer is in place here: Since my web site is on a Unix server, I could not test this. I know you can use Unix shells for similar purposes. I do not know whether Windows servers let you use batch files for CGI. But since more people understand MS DOS batch files than Unix shells, I chose this example.

How do I get input?


First off, let me emphasize that the web is not interactive. That is your CGI program cannot ask the user for input, process it, send out some output, ask for more input etc. But this is one of the reasons why CGI programming is so simple. The program receives user input at most once, right at the start, and sends output once. Nevertheless, both the input and output can be of any size your program can handle. That said, your program can receive user input in one of two ways depending on what method the browser uses to send it to the server.

Where does the browser find user input?


The browser receives user input using HTML forms. A form can instruct the browser to send the data in one of two methods: GET and POST. The GET method sends it to you as part of the URL. The POST method sends it as input from stdin. This seems to have several major advantages over using the URL:

You can send more data (URL has a size limit). The data is not logged along the way. Sending a password, for example, as part of the URL leaves a trail in the various systems your data is travelling through! Data does not appear in the browser Location bar. Again, showing a password there may not be appreciated by the user if someone is watching over his shoulder.

How do I know which method is used?


Before the web server loads your CGI program, it sets several environment variables which you can study to know how much input data you are getting and where it is coming from (i.e. URL or stdin). One of these environment variables is REQUEST_METHOD. Its value can be POST, GET, or occasionally HEAD. In the first case, CONTENT_LENGTH tells you how many bytes of data you should read from stdin. And CONTENT_TYPE, tells you that this data is coming from a form, or possibly from some other source. Once you have read the data, you can process it, and send your output to stdout. Of course, you will probably want to write it as HTML data, with all of its formatting. But CGI programs can produce any kind of output, for example a GIF file, or anything else. This is why, in the first two steps, you need to tell the browser just what kind of data you are sending

it. For HTML you do it by sending the string Contenttype: text/html followed by two line feeds before doing anything else. So, in C, you could code something like printf("Content-type: text/html\n\n");

Lets see an example


Armed with this knowledge, I wrote a simple C program which outputs its command line using argc and argv, then checks the environment variables I mentioned above, and if there is data at stdin, it reads it. It then sends all this information out in plain HTML. I have called this program c and placed it in my cgi-bin directory. I also created a simple HTML form. The code for the form is here:
<b>Pick your favorite color</b><br> <form method="POST" action="http://www.whizkidtech.redprince.net/cgibin/c"> <input type="RADIO" name="color" value="red"> Red<br> <input type="RADIO" name="color" value="green"> Green<br> <input type="RADIO" checked name="color" value="blue"> Blue<br> <input type="RADIO" name="color" value="cyan"> Cyan<br> <input type="RADIO" name="color" value="magenta"> Magenta<br> <input type="RADIO" name="color" value="yellow"> Yellow<br> <br><b>On the scale 1 - 3, how favorite is it?</b><br><br> <select name="scale" size=1> <option>1 <option selected>2 <option>3 </select> <br> <input type="HIDDEN" name="favorite color" size="32"> <input type="Submit" value="I'm learning" name="Attentive student"> <input type="Submit" value="Give me a break!" name="Overachiever"> <input type="Reset" name="Reset"> </form>

Please note the existence of a hidden input with no assigned value, just to test what it sends to the program. You can play with this form and see what it sends to my program below. A thing to note is that it converts any spaces into plusses, and any other non-alphanumeric values into %xx, where xx is the hexadecimal version of its ASCII value. Fortunately, that is fairly simple to fix,

and c will also show you the fixed input. After that, there is one thing left: You need to parse the input. By that I mean, you need to break it appart into pairs of key and value. Each pair is separated by an ampersand (&). Please note I said separated, not terminated. There is no ampersand after the last pair. Within each pair, the key part is on the left of an equal sign (=), while the value is on the right. Pretty much like an assignment in C and many other programming languages. To illustrate this, c parses the data and shows you the pairs. You will note that the favorite color key seems to have no value. But it does. It just happens to be spaces. Instruct your browser to show you the page source and take a look at the HTML code c produces to see what I am talking about. Here is the form, play with it as much as you want. Just click BACK after viewing the results to return here.

Pick your favorite color


Red Green Blue Cyan Magenta Yellow On the scale 1 - 3, how favorite is it?
2

I'm learning

Give me a break!

Reset

What about the other method?


You need to try the same form with one modification: Instead of POST, use GET. Note that this time the program will show no input from stdin. Unfortunately, it also gets no additional data in its argv array. Yet, if you take a look at the URL (which your browser should show you), you will see all the data placed there. The trick is in realizing that, despite appearances, the URL has nothing to do with the command line of the program. How, then, do you get to the data from within your program? You read it from the environment variable QUERY_STRING.

Pick your favorite color


Red Green Blue Cyan Magenta Yellow On the scale 1 - 3, how favorite is it?
2

I'm learning

Give me a break!

Reset

Feel free to modify the form in any way you want (just copy its source code above), and try it again. But do me a favor: Do it from your own computer. Please do not place it on a web page unless you write your own program to test it with. Please understand that I have a bandwidth limit with my host and it might cost me extra money if you let the whole world use my CGI program from my server. By the way, if you clicked RELOAD while you were in the c program, your browser probably reacted differently when the data was sent by POST from when it was sent by GET. If you did not do that, go back and try it!

Running "Hello, world!" as a CGI Script


This section of the tutorial covers:

Content-type headers Here-document quoting File locations/extensions for running CGI scripts Testing from the command line Testing from the Web server CGI script file permissions

Content-type headers
Now let's modify hello.pl so it will run as a CGI script. Every CGI script needs to output a special header as the first thing the script outputs. This header line is checked by the Web server, then passed on to the remote user invoking the script in order to tell that user's browser what type of file to expect. Most of the time, your script is going to output an HTML file, which means you'll need to output the following header:
print "Content-type: text/html\n\n";

You need to output it exactly like that, including the capital "C" and the lowercase everything else. Please note that there are two newline characters (\n\n) at the end of the header. CGI novices tend to forget that, but it's really important, since the header needs to be followed by a blank line.

So, adding that line to our hello.pl script gives us the following:
#!/usr/local/bin/perl # hello.pl -- my first perl script! print "Content-type: text/html\n\n"; print "Hello, world!\n";

Return to the top of the page

Here-document quoting
As long as we're claiming this is HTML that we're outputting, let's go ahead and make our output a valid HTML file:
#!/usr/local/bin/perl # hello.pl -- my first perl script! print "Content-type: text/html\n\n"; print <<"EOF"; <HTML> <HEAD> <TITLE>Hello, world!</TITLE> </HEAD> <BODY> <H1>Hello, world!</H1> </BODY> </HTML> EOF

Take a careful look at the stuff that replaced the "" characters used to quote the original "Hello, world!\n" line. That <<"EOF"; thing, and the EOF all alone on a line by itself at the end, is being used to quote a multi-line string. Basically, it's being used to indicate what the "print" command should print. This is sometimes called "here-document" quoting; you can call it whatever you want, but it's a real time-saver in CGI scripts. There's nothing special about the "EOF" string I used to delimit my output, by the way; you can use anything you like, as long as it's the exact same at the beginning and end of the quoted string (including capitalization). So I could have said:

print <<"Walnuts"; Some stuff I want to have printed... Walnuts

and it would have worked fine. Just make sure, again, that "Walnuts" is all by itself on the last line. Even a space character after it will screw things up, as will anything in front of it. It needs to be right at the left margin, with nothing after it but a newline. Return to the top of the page

File locations/extensions for running CGI scripts


There's one more thing we need to do in order to run hello.pl as a CGI script: we need to let the Web server know it's a CGI script. How you do this depends on how your ISP has configured their Web server. The two most common ways are to change the file's name so it ends with a ".cgi" extension, or to place the file in a special directory on the server called "/cgi" or "/cgi-bin". Ask your ISP how the server is configured, and proceed accordingly. If you want to go ahead and try sticking a ".cgi" extension on the end of your script be my guest; the worst that will happen if your Web server doesn't support that convention is that it will simply deliver the text of the script to your browser, rather than executing it. Like I said, ask your ISP. For this discussion, I'm going to assume that you can run a CGI script in any directory in your Web space, as long as the script has a ".cgi" extension. So let's copy the script to an appropriate directory, and change its extension to ".cgi":
catlow:/u1/j/jbc> mkdir /w1/l/lies/begperl catlow:/u1/j/jbc> chmod 755 /w1/l/lies/begperl catlow:/u1/j/jbc> cp hello.pl /w1/l/lies/begperl catlow:/u1/j/jbc> cd /w1/l/lies/begperl catlow:/w1/l/lies/begperl> mv hello.pl hello.cgi

Did you follow that? First I used the Unix "mkdir" command to create a new directory called "begperl" in my Web space, which in my particular case is located at "/w1/l/lies". (Trivia: The Unix mkdir command is the only one I can think of that is actually longer than the equivalent DOS command, "md".) I chmodded the directory to permissions 755, then used the "cp" command to copy hello.pl from my home directory to the new directory in my Web space. Then I used the "cd" command to change to that directory, and used the "mv" command to change the file's name from "hello.pl" to "hello.cgi", so the Web server will know that it's a CGI script. A couple of additional things about directories and Web space and so on: My Unix command prompt has been customized to show my current directory; if yours hasn't, you can use the command "pwd" to list the current directory any time you lose track. Another handy tool is the

"~" symbol; when you use that in a command in the Unix shell, it is automatically replaced by the path of your "home" directory, which is the directory you start off in when you first log into the server. Many ISPs, by the way, use a convention of having users' Web stuff go in a directory called "public_html" beneath their home directory. If that's the case with your ISP, then you'll need to substitute directory names accordingly in the instructions given above, and use a Web address of the form "http://www.your-isp.com/~username/" or "http://www.your-isp.com/username/" to access documents in that public_html directory. Return to the top of the page

Testing from the command line


Before we try to run the script via the Web server, let's try running it from the command line, just to make sure everything works the way we want it to:
catlow:/w1/l/lies/begperl> ./hello.cgi Content-type: text/html <HTML> <HEAD> <TITLE>Hello, world!</TITLE> </HEAD> <BODY> <H1>Hello, world!</H1> </BODY> </HTML>

Did you get that? Great! If not, what happened? Did you get a "permission denied" error, by any chance? Then check your permissions: You probably lost the "execute" permission somewhere along the way. Return to the top of the page

Testing from the Web server


Assuming your script did print out from the command line, let's go to the final step: testing the script via the Web server. In my case, that means typing the following into my Web browser's Location box:
http://www.lies.com/begperl/hello.cgi

In your case, it might be something like "http://www.your_isp.com/~your_username/hello.cgi". Whatever it is, go ahead and try it.
Internal Server Error The server encountered an internal error or misconfiguration and was unable to complete your request. Please contact the server administrator, jbc@cyberverse.com and inform them of the time the error occurred, and anything you might have done that may have caused the error.

Ack. The dreaded "internal server error". You will see messages like this a lot when you are learning to run CGI scripts. What it means is, something (probably some sort of error message) got printed out by your script before the "Content-type: text-html\n\n" header. It sure would be helpful if you could see that error message. Fortunately, since your ISP is enlightened enough to let you run your own CGI scripts, they're almost certainly enlightened enough to give you access to the Web server's error log. In my case, it's at /w1/l/lies/.logs/error.log. One way to check that error log is to open up a second telnet window, log into a second shell session on your Unix server, and enter the command "tail -f /path/to/error.log". This will cause your window to display new entries in the error log as they are added. Then you can just pop back and forth from one window to the other to check the error log as you work on your script. In this case, I'm just going to use my existing shell session, and issue the "tail" command without the -f switch to print out the last 10 lines of the error log, looking for the problem that caused my script to fail. Lo and behold, there it is:
catlow:/w1/l/lies/begperl> tail /w1/l/lies/.logs/error.log (stuff deleted) exec of /w1/l/lies/begperl//hello.cgi failed, reason: Permission denied (errno = 13) [Fri Sep 11 23:50:18 1998] access to /w1/l/lies/begperl//hello.cgi failed for 207.71.222.193, reason: Premature end of script headers

So it was another "Permission denied" error. But wait; the script ran fine when I ran it manually from the command line. What gives? Return to the top of the page

CGI script file permissions


What gives is that when I ran it from the command line I ran it as me, the file's owner. But when I ran it as a CGI script using the Web server, I ran it as somebody else. This is a key point to understand: When a Web server runs your script, it's not the same thing as you running your script. That CGI script, by definition, is accessible to every obnoxious wouldbe hoodlum on the Internet, so when the Web server runs it, the assumption is that it is being run by someone who is malicious, stupid, or both. For this reason, Web servers are configured to run CGI scripts as a special, underprivileged user (often called "nobody"; another quaint Unixism that strikes me as funny) in order to minimize any damage that might be done. In point of fact there isn't any real harm that this particular script can do, so we don't need to worry about the security implications of letting John Q. Public run it. To let him do so, though, we need to turn on the "read" and "execute" permissions for the "everybody else in the world" category of user. And we may as well turn on the same permissions for the file's group while we're at it, yielding a permissions setting of "755", which is the setting we're going to use for most of the CGI scripts we create from here on out:
catlow:/w1/l/lies/begperl> chmod 755 hello.cgi catlow:/w1/l/lies/begperl> ls -l hello.cgi -rwxr-xr-x 1 lies www 217 Sep 11 23:42 hello.cgi

Now let's try running hello.cgi via the Web server again, either by hitting the "Reload" button in our browser, or just typing the script's address into the Location box again and hitting "Enter":
Hello, world!

All right! Take a break and pour yourself a tall, frosty one. You've earned it.