Вы находитесь на странице: 1из 34

Chapter Five

Advanced File Processing

Guide To UNIX Using Linux


Fourth Edition
Chapter 5 Unix (34 slides)

CTEC 110

Objectives

Use the pipe operator to redirect the output of


one command to another command
Use the grep command to search for a specified
pattern in a file
Use the uniq command to remove duplicate lines
from a file
Use the comm and diff commands to compare
two files
Use the wc command to count words, characters
and lines in a file

Chapter 5 Unix (34 slides)

CTEC 110

Objectives (continued)

Use manipulation and transformation commands,


which include sed, tr, and pr
Design a new file-processing application by
creating, testing, and running shell scripts

Chapter 5 Unix (34 slides)

CTEC 110

Advancing Your
File-Processing Skills

Selection commands focus on extracting specific


information from files

Chapter 5 Unix (34 slides)

CTEC 110

Advancing Your
File-Processing Skills (continued)

Manipulation and transformation commands


alter and transform extracted information into
useful and appealing formats

Chapter 5 Unix (34 slides)

CTEC 110

Advancing Your
File-Processing Skills (continued)

Chapter 5 Unix (34 slides)

CTEC 110

Using the Selection Commands

Using the Pipe Operator


The pipe operator (|) redirects the output of one
command to the input of another

An example would be to redirect the output of the


ls command to the more command
The pipe operator can connect several commands
on the same command line

Chapter 5 Unix (34 slides)

CTEC 110

Using the Pipe Operator


Using pipe
operators and
connecting
commands is
useful when
viewing directory
information

Chapter 5 Unix (34 slides)

CTEC 110

Using the grep Command

Used to search for a specific pattern in a file, such as


a word or phrase
greps options and wildcard support allow for powerful
search operations
You can increase greps usefulness by combining with
other commands, such as head or tail

Chapter 5 Unix (34 slides)

CTEC 110

Using the uniq Command

Removes duplicate lines from a file


Compares only consecutive lines, therefore uniq
requires sorted input
uniq has an option that allows you to generate output
that contains a copy of each line that has a duplicate

Chapter 5 Unix (34 slides)

CTEC 110

10

Using the uniq Command (continued)

Chapter 5 Unix (34 slides)

CTEC 110

11

Using the uniq Command (continued)

Chapter 5 Unix (34 slides)

CTEC 110

12

Using the comm Command

Used to identify duplicate lines in sorted files


Unlike uniq, it does not remove duplicates, and it
works with two files rather than one
It compares lines common to file1 and file2, and
produces three column output
Column one contains lines found only in file1

Column two contains lines found only in file2


Column three contains lines found in both files

Chapter 5 Unix (34 slides)

CTEC 110

13

Using the diff Command

Attempts to determine the minimal changes needed


to convert file1 to file2
The output displays the line(s) that differ
Codes in the output indicate that in order for the files
to match, specific lines must be added or deleted

Chapter 5 Unix (34 slides)

CTEC 110

14

Using the wc Command

Used to count the number of lines, words, and bytes


or characters in text files
You may specify all three options in one issuance of
the command
If you dont specify any options, you see counts of
lines, words, and characters (in that order)

Chapter 5 Unix (34 slides)

CTEC 110

15

Using the wc Command (continued)

The options for the


wc command:
l for lines
w for words
c for characters

Chapter 5 Unix (34 slides)

CTEC 110

16

Using Manipulation and


Transformation Commands

These commands are: sed, tr, pr


Used to edit and transform the appearance of
data before it is displayed or printed

Chapter 5 Unix (34 slides)

CTEC 110

17

Introducing the sed Command

sed is a UNIX/Linux editor that allows you to make


global changes to large files
Minimum requirements are an input file and a
command that lets sed know what actions to apply to
the file
sed commands have two general forms
Specify an editing command on the command
line
Specify a script file containing sed commands

Chapter 5 Unix (34 slides)

CTEC 110

18

Translating Characters
Using the tr Command

tr copies data from the standard input to the standard


output, substituting or deleting characters specified by
options and patterns
The patterns are strings and the strings are sets of
characters
A popular use of tr is converting lowercase characters
to uppercase

Chapter 5 Unix (34 slides)

CTEC 110

19

Using the pr Command to


Format Your Output

pr prints specified files on the standard output in


paginated form
By default, pr formats the specified files into singlecolumn pages of 66 lines
Each page has a five-line header containing the file
name, its latest modification date, and current page,
and a five-line trailer consisting of blank lines

Chapter 5 Unix (34 slides)

CTEC 110

20

Designing a New
File-Processing Application

The most important phase in developing a new


application is the design
The design defines the information an application
needs to produce
The design also defines how to organize this
information into files, records, and fields, which
are called logical structures

Chapter 5 Unix (34 slides)

CTEC 110

21

Designing Records

The first task is to define the fields in the records and


produce a record layout
A record layout identifies each field by name and data
type (numeric or nonnumeric)
Design the file record to store only those fields
relevant to the records primary purpose

Chapter 5 Unix (34 slides)

CTEC 110

22

Linking Files with Keys

Multiple files are joined by a key: a common field that


each of the linked files share
Another important task in the design phase is to plan
a way to join the files
The flexibility to gather information from multiple files
comprised of simple, short records is the essence of a
relational database system

Chapter 5 Unix (34 slides)

CTEC 110

23

Chapter 5 Unix (34 slides)

CTEC 110

24

Creating the Programmer


and Project Files

With the basic design complete, you now implement


your application design
UNIX/Linux file processing predominantly uses flat files
Working with these files is easy, because you can
create and manipulate them with text editors like vi and
Emacs

Chapter 5 Unix (34 slides)

CTEC 110

25

Creating the Programmer


and Project Files (continued)

Chapter 5 Unix (34 slides)

CTEC 110

26

Formatting Output

The awk command is


used to prepare
formatted output
For the purposes of
developing a new fileprocessing application,
we will focus primarily
on the printf action of
the awk command

Awk provides a shortcut to other UNIX/Linux commands

Chapter 5 Unix (34 slides)

CTEC 110

27

Using a Shell Script to


Implement the Application

Shell scripts should contain:


The commands to execute

Comments to identify and explain the script so


that users or programmers other than the author
can understand how it works
Use the pound (#) character to mark comments in a
script file

Chapter 5 Unix (34 slides)

CTEC 110

28

Running a Shell Script

You can run a shell script in virtually any shell that


you have on your system
The Bash shell accepts more variations in command
structures that other shells
Run the script by typing sh followed by the name of
the script, or make the script executable and type ./
prior to the script name

Chapter 5 Unix (34 slides)

CTEC 110

29

Putting it All Together to


Produce the Report

An effective way to develop applications is to combine


many small scripts in a larger script file
Have the last script added to the larger script print a
report indicating script functions and results

Chapter 5 Unix (34 slides)

CTEC 110

30

Chapter Summary

UNIX/Linux file-processing commands are (1)


selection and (2) manipulation and
transformation commands
uniq removes duplicate lines from a sorted file
comm compares lines common to file1 and file2
diff tries to determine the minimal set of changes
needed to convert file1 into file2

Chapter 5 Unix (34 slides)

CTEC 110

31

Chapter Summary (continued)

tr copies data read from the standard input to


the standard output, substituting or deleting
characters specified
sed is a file editor designed to make global
changes to large files
pr prints the standard output in pages

Chapter 5 Unix (34 slides)

CTEC 110

32

Chapter Summary (continued)

The design of a file-processing application


reflects what the application needs to produce
Use record layout to identify each field by name
and data type
Shell scripts should contain commands to
execute programs and comments to identify and
explain the programs

Chapter 5 Unix (34 slides)

CTEC 110

33

Chapter 5 Unix Exercises


Work through Hands-on Projects
at end of chapter 5
Canvas: Review Questions 5
(Do not do questions 22,23,24 and 25)

Quiz 5 Unix

Chapter 5 Unix (34 slides)

CTEC 110

34