Вы находитесь на странице: 1из 4

CS 306 Linux Programming Fall 2013

Lab #1 (Due: 9/25/13 by 3:00pm)


The subject of your rst programming assignment is writing a C program to duplicate (a subset of) the functionality of the Linux/UNIX tr command. This program will use only C standard library functionsno system calls. tr is a command that goes through every character in the input stream, either translating particular characters (to dierent characters) or deleting particular characters. It is considered a lter because it reads from standard input and writes to standard output, so it can be used in shell pipelines. Your program will be simpler than tr in two ways: it need accept only the -d option; it need accept only a limited set of interpreted characters. The syntax for your mytr is to be: mytr [-d] SET1 [SET2] The SETs are specied as strings of characters. Most characters are taken literally, but tr uses a number of interpreted characters, which are character sequences that denote one or more dierent characters. For example, the two-character sequence \n is interpreted as denoting the newline character, and the notation [:alnum:] is interpreted as the set of all alphanumeric characters. See the man page for tr to see the entire set of interpreted characters. For this assignment, you need handle only the following set of four interpreted characters: \\, \n, \r, \t. Here are examples of syntactically valid calls to mytr: mytr abcd ABCD (reads from stdin, use ctrl-d to terminate input) mytr "\t" " " < filewtabs > filewspaces mytr -d "\r" (a simple dos2unix) You are to submit your code for the assignment in a single le named lab1.c. The le must be able to compile and run on its own, which means it must contain a main and any needed functions. main must obviously be set up to take command-line arguments using the C argc and argv mechanism. Your code should assume that the executable created will be named mytr. The syntax for mytr means that all valid calls will have exactly two command-line arguments. They will either both be character sets, or the rst argument will be -d and the second argument will be a character set. The arguments will be available as C strings, via argv[1] and argv[2]. Your code is to include and use at least two functions, with prototypes: char *interpret_set(const char *input_set) input_set string whose chars represent a character set given as an argument to the command; return is new string whose chars are the interpreted/modied character set (i.e., character sequences converted to their corresponding C chars).

int

charpos(const char character, const char *char_set) character a char to try to match; char_set (already interpreted) character set string; return is int representing 0-based position of character in char_set, or -1 (negative one) if character is not in char_set.

interpret_set() is to take a character set argument string as input, and convert any of the specied interpreted character sequences to their corresponding single char. E.g., the two-char sequence \, t is to be interpreted as a tab character, so these two chars are to be replaced by the single C char \t (tab). Note that the tab char can also represented as \011, since 011 is the ASCII octal code for tab. The return is to be a legal C string, whose char elements do not require any interpretation (i.e., each char in the string is a member of the character set). The basic logic for main should be as follows: 1. examine the command line arguments to determine whether the -d option was specied or not, and thus which case is being called for: translate vs. delete; 2. use interpret_set() to interpret the character set(s) given in the arguments; 3. if delete case, read through the chars in the standard input stream, writing those that do not have a match in the character set out to standard output; 4. if translate case, read through the chars in standard input stream, writing each char to standard output if it does not have a match in the rst character set, else writing its translation char from the second character set out if it does have a match; 5. exit with appropriate exit status. The basic logic for interpret_set() should be as follows: 1. set up a string array large enough to copy input_set to; 2. loop through the chars in input_set; 3. each char that is not a \ should be copied to the output string; 4. if a \ is found, read the next char and determine what to copy to the output string; 5. ultimately, return the new string, containing the individual ASCII chars denoted by input_set. The basic logic for charpos() should be as follows: 1. iterate through the chars in the char_set array looking for a match with character; 2. if a match is found, return the 0-based array index of the matching char in char_set; 3. otherwise, return -1. A key issue in implementing interpret_set() is allocating memory to hold the string that is to be returned. Memory to hold the string/set must be allocated in interpret_set(). However, there are potential pitfalls when doing this. One pitfall is that in a function, memory for local variables is allocated on the stack, and can be automatically reclaimed/reused

once the function returns. This means that one should never return pointers to these variables from a function (and remember that strings are arrays, and arrays are represented by a pointer to the array start). While the compiler will allow you to do this, it is a logic error! An alternative is to declare the new string array to be static. This causes memory to be allocated in special part of the address space rather than on the stack, making it safe to return a pointer to this memory. Unfortunately, the size of static arrays must be known at compile time (their size must be declared with a constant). The size of the character sets will only be known at run time, and while we might be able to always allocate very large arrays and have them work most of the time, this is clearly not a good approach. The proper way to solve the array allocation problem is to use dynamic memory via malloc() and related calls. These calls allocate memory from the heap rather than the stack, and heap memory is not automatically reclaimedit must be manually reclaimed using free(). This means that you can return its address from a function. Additional instructions and requirements: You are to use C library functions (from <stdio.h>) to handle all I/O. The functions you will need include: getchar(), putchar(), printf(), and fprintf(). Remember that your program is to be reading from standard input, denoted as stdin, and writing to standard output, denoted as stdout. You will probably want to use one or more functions from the C string library (<string.h>), such as strlen(). Your program must error check the orginal call and all library calls except those printing error messages. In the case of an error in the original call (number of command-line arguments), the output should be a usage message (though tr does something else): Usage: mytr [-d] SET1 [SET2] If an error occurs while calling getchar() or putchar(), print the standard system message (using either perror() or strerror()). Note that is extremely dicult to cause these errors to occur, so testing whether your code properly prints an error message is not straightforward. You may wish to duplicate the error message format that many Linux/UNIX programs use, which is to prex the message with the name of the program followed by a colon, give an informative message follwed by a colon, and end with the system error message. E.g., mytr: error reading from input stream: Invalid argument). All error messages are to go to standard error rather than to standard output. Note that getchar() has an int return type because while it normally returns the next character (char) in the le, it returns EOF when it reaches the end of the le or gets an error trying to read from the le (EOF is normally the int value -1). Whether getchar() has returned EOF due to the end-of-le (normal) or due to a read error (an error), can be distinguished by using either ferror() or feof(). main must ultimately return an appropriate exit status. Normally this should be either EXIT_SUCCESS or EXIT_FAILURE. Read the man page (and/or experiment with tr) to see what the appropriate exit status is. Obviously if there is a read/write error, the exit status should indicate failure.

You are to submit your le electronically from the CS Dept. Linux workstations, so your le must be stored on your CS Linux account. SSH to pc00 or pc01, cd to the directory your le is stored in, then type cs306submit lab1.c to submit your code. Submissions are timestamped, and no late submissions will be accepted unless the due date/time is extended for the entire class. Every time you execute the submit command it uploads the latest le and overwrites any previous version you may have submitted. Remember that you must get at least a 33 on the lab for admission to the relevant exam. Labs that do not compile or that compile but immediately fail when run, will generally receive automatic zero scores, so start early and test your program well. You should not be getting any warnings from GCC, so make certain you check for this, as most warning indicate that there are serious problems with the code. Since we still have to look at various parts of your code to grade it, code must be appropriately indented and of a reasonable style. You are strongly urged to develop your program in stages rather than typing the whole thing in and then trying to get it working. Most complete programs will initially have multiple interacting bugs, making debugging potentially very dicult. The goal when working in stages is to make small enough changes so that only very few things could be wrong at any one time, making it easier to identify what is causing an error. While there are many ways you might proceed, one possible sequence of stages is: 1. Write a main that simply reads through standard input and writes all chars to standard output. Make sure you understand how to stop at EOF for input. Also make sure you understand how to call your mytr in a pipeline and with redirection. 2. Now modify the program to accept and decode the command line arguments, executing one branch if deleting and another if translating. But simply print out the case and set(s) (prior to looping through the input stream). 3. At this point, you can implement charpos() and work it into the logic for looping through the input stream. Note that if you dont use any interpreted characters in your sets, you dont need to call interpret_set(), so simply test this stage with them. 4. To complete program, implement interpret_set() and insert calls to it into your existing logic. Extensions (for more challenges and for CS 491 students): 1. Handle additional tr interpreted characters: \NNN CHAR1-CHAR2 2. Implement trs -c (complement) option.

Вам также может понравиться