Академический Документы
Профессиональный Документы
Культура Документы
Copyright © 2008
Chapter 7 Image Processing 1
7.1 Preliminaries 1
7.1.1 Colors and the RGB System 2
7.1.2 Analog and Digital Information 3
7.1.3 Sampling and Digitizing Images 3
7.1.4 Image File Formats 4
7.2 Image Manipulation Operations 4
7.2.1 The Properties of Images 5
7.2.2 Object Instantiation 5
7.2.3 The images Module 6
7.2.4 A Loop Pattern for Traversing a Grid 8
7.2.5 A Word on Tuples 9
7.2.6 Converting an Image to Black and White 10
7.2.7 Converting an Image to Grayscale 11
7.2.8 Copying an Image 12
7.2.9 Blurring an Image 13
7.2.10 Edge Detection 14
7.2.11 Reducing the Image Size 15
Exercises 7.2 16
Summary 17
Review Questions 18
7.1 Preliminaries
Over the centuries, human beings have developed numerous technologies for
representing the visual world, the most prominent being painting, photography and
motion pictures. The most recent form of this type of technology is digital image
processing. This enormous field includes the principles and techniques for
• the capture of images with devices such as flatbed scanners and digital cameras
• the representation and storage of images in efficient file formats
• constructing the algorithms in image manipulation programs such as Adobe
Chapter 7 Image Processing 2
Copyright © 2008
Photoshop.
In this section, we focus on some of the basic concepts and principles used to solve
problems in image processing.
You might be wondering how many total RGB color values are at our disposal.
That number would be equal to all of the possible combinations of three values, each of
which has 256 possible values, or 256 * 256 * 256, or 16,777,216 distinct color values.
Although the human eye cannot discriminate between adjacent color values in this set,
the RGB system is called a true color system.
Another way to consider color is from the perspective of the computer memory
required to represent a pixel’s color. In general, N bits of memory can represent 2N
Chapter 7 Image Processing 3
Copyright © 2008
distinct data values. Conversely, N distinct data values require at least log2N bits of
memory. In the old days, when memory was expensive and displays came in black and
white, only a single bit of memory was required to represent the two color values. Thus,
when displays capable of showing 8 shades of gray came along, 3 bits of memory were
required to represent each color value. Early color monitors might support the display of
256 colors, so 8 bits were needed to represent each color value. Each color component of
an RGB color requires 8 bits, so the total number of bits needed to represent a distinct
color value is 24. The total number of RGB colors, 224, happens to be 16,777,216.
which is approximately one megapixel. For most purposes, however, we can settle for a
much lower sampling size and thus fewer pixels per square inch.
• Rotate an image
• Convert an image from color to grayscale
• Apply color filtering to an image
• Highlight a particular area in an image
• Blur all or part of an image
• Sharpen all or part of an image
• Control the brightness of an image
• Perform edge detection on an image
• Enlarge or reduce an image’s size
• Apply color inversion to an image
• Morph an image into another image
You’ll learn how to write Python code that can perform some of these
manipulation tasks later in this chapter, and have a chance to practice others in the
programming projects.
The expression on the right side of the assignment, also called a constructor, resembles a
function call. The constructor can receive as arguments any initial values for the new
object’s attributes, or other information needed to create the object. As you might expect,
Chapter 7 Image Processing 6
Copyright © 2008
if the arguments are optional, reasonable defaults are provided automatically. The
constructor then manufactures and returns a new instance of the class.
Before we discuss some standard image processing algorithms, let’s try out the
resources of the images module. We assume that the file images.py is located in the
current working directory. The current version of the images module accepts only
image files in GIF format. For the purposes of this exercise, we also assume that a GIF
image of my cat, Smokey, has been saved in a file named smokey.gif in the current
working directory. The following session with the interpreter does three things:
Chapter 7 Image Processing 7
Copyright © 2008
The resulting image display window is shown in Figure 7.1, although the actual image is
in color, with green grass in the background. In this book the colors are not visible.
Python raises an error if it cannot locate the file in the current directory, or the file is not a
GIF file. Note also that the user must close the window to return control to the caller of
the method draw. If you are working in the shell, the shell prompt will reappear when
you do this. The image can then be redrawn, after other operations are performed, by
calling draw again.
Once an image has been created, we can examine its width and height, as follows:
>>> image.getWidth()
198
>>> image.getHeight()
149
>>>
The method getPixel returns a tuple of the RGB values at the given
coordinates. The following session shows the information for the pixel at position (0, 0),
which is at the image’s upper left corner.
Chapter 7 Image Processing 8
Copyright © 2008
>>> image.getPixel(0, 0)
(198, 224, 117)
Instead of loading an existing image from a file, the programmer can create a
new, blank image. The programmer specifies the image’s width and height; the resulting
image consists of all white pixels. Such images are useful for creating backgrounds or
drawing simple shapes, or creating new images that receive information from existing
images.
The programmer can use the method setPixel to replace an RGB value at a
given position in an image. The next session creates a new 150 by 150 image. We then
replace the pixels along a horizontal line at the middle of the image with new, blue pixels.
The images before and after this transformation are shown in Figure 7.2. The loop visits
every pixel along the row of pixels whose y coordinate is the image’s height divided by 2.
Finally, an image can be saved under its current file name or a different file name.
The save operation is used to write an image back to an existing file using the current
file name. The save operation can also receive a string parameter for a new file name.
The image is written to a file with that name, which then becomes the current file name.
The following code saves our new image using the file name horizontal.gif:
>>> image.save("horizontal.gif")
If you omit the .gif extension in the file name, the method adds it automatically.
>>> width = 2
>>> height = 3
>>> for y in xrange(height):
... for x in xrange(width):
... print "(" + str(x) + "," + str(y) + ")",
... print
...
(0,0) (1,0)
(0,1) (1,1)
(0,2) (1,2)
>>>
As you can see, this loop marches across a row in an imaginary 2 by 3 grid, prints the
coordinates at each column, and then moves on to the next row. The following template
captures this pattern, which is called a row-major traversal. We use this template to
develop many of the algorithms that follow.
for y in xrange(height):
for x in xrange(width):
do something at position (x, y)
We can now see what the RGB values are by examining the variables:
>>> r
198
>>> g
224
>>> b
Chapter 7 Image Processing 10
Copyright © 2008
117
Our task is completed by building a new tuple with the results of the computations and
resetting the pixel to that tuple:
>>>
>>> image.setPixel(0, 0, (r + 10, g + 10, b + 10))
The elements of a tuple can also be bound to variables when that tuple is passed
as an argument to a function. For example, the function average computes the average
of the numbers in a 3-tuple as follows:
Armed with these basic operations, we can now examine some simple image
processing algorithms. Some of the algorithms visit every pixel in an image and modify
its color in some manner. Other algorithms use the information from an image’s pixels to
build a new image. For consistency and ease of use, we represent each algorithm as a
Python function that expects an image as an argument. Some functions return a new
image, whereas others simply modify the argument image.
def blackAndWhite(image):
"""Converts the argument image to black and white."""
blackPixel = (0, 0, 0)
whitePixel = (255, 255, 255)
for y in xrange(image.getHeight()):
for x in xrange(image.getWidth()):
(r, g, b) = image.getPixel(x, y)
average = (r + g + b) / 3
if average < 128:
image.setPixel(x, y, blackPixel)
else:
image.setPixel(x, y, whitePixel)
Chapter 7 Image Processing 11
Copyright © 2008
Note that the second image appears rather stark, like a woodcut.
Our function can be tested in a short script, as follows:
main()
Note that the main function includes an optional argument for the image file name. Its
default should be the name of an image in the current working directory. When loaded
from IDLE, main can be run multiple times with other file names to test the algorithm
with different images.
average = (r + g + b) / 3
image.setPixel(x, y, (average, average, average))
Although this method is simple, it does not reflect the manner in which the different color
components affect human perception. The human eye is actually more sensitive to green
and red than it is to blue. As a result, the blue component appears darker than the other
Chapter 7 Image Processing 12
Copyright © 2008
two components. A scheme that combines the three components needs to take these
differences in luminance into account. A more accurate method would weight green
more than red and red more than blue. Therefore, to obtain the new RGB values, instead
of adding the color values up and dividing by three, we should multiply each one by a
weight factor and add the results. Psychologists have determined that the relative
luminance proportions of green, red, and blue are .587, .299, and .114, respectively. Note
that these values add up to 1. Our next function, grayscale, uses this strategy, and
Figure 7.4 shows the results.
def grayScale(image):
"""Converts the argument image to grayscale."""
for y in xrange(image.getHeight()):
for x in xrange(image.getWidth()):
(r, g, b) = image.getPixel(x, y)
r = int(r * 0.299)
g = int(g * 0.587)
b = int(b * 0.114)
lum = r + g + b
image.setPixel(x, y, (lum, lum, lum))
A comparison of the results of this algorithm with those of the simpler one using the
crude averages is left as an exercise.
def blur(image):
"""Builds and returns a new image which is a blurred
copy of the argument image."""
new = image.clone()
for y in xrange(1, image.getHeight() – 1):
for x in xrange(1, image.getWidth() – 1):
oldP = image.getPixel(x, y)
left = image.getPixel(x - 1, y) # To left
right = image.getPixel(x + 1, y) # To right
top = image.getPixel(x, y - 1) # Above
bottom = image.getPixel(x, y + 1) # Below
sums = reduce(tripleSum, #2
[oldP, left, right, top, bottom])
averages = tuple(map(lambda x: x / 5, sums)) #3
new.setPixel(x, y, averages)
return new
The code for blur includes some interesting design work. In the following
explanation, the numbers referred to appear to the right of the corresponding lines of
code:
• At #1, the auxiliary function tripleSum is defined. This function expects two
tuples of integers as arguments and returns a single tuple containing the sums of the
values at each position.
Chapter 7 Image Processing 14
Copyright © 2008
• At #2, five tuples of RGB values are wrapped in a list and passed with the
tripleSum function to the reduce function. This function repeatedly applies
tripleSum to compute the sums of triples, until a single tuple containing the total
sums is returned.
• At #3, a lambda function is mapped onto the tuple of sums and the resulting list is
converted to a tuple. The lambda function divides each sum by 5. Thus, we are left
with a tuple of the average RGB values.
Although this code is still rather complex, try writing it without map and reduce and
compare the two versions.
blackPixel = (0, 0, 0)
whitePixel = (255, 255, 255)
new = image.clone()
for y in xrange(image.getHeight() – 1):
for x in xrange(image.getWidth()):
oldPixel = image.getPixel(x, y)
leftPixel = image.getPixel(x - 1, y)
bottomPixel = image.getPixel(x, y + 1)
oldLum = average(oldPixel)
leftLum = average(leftPixel)
bottomLum = average(bottomPixel)
if abs(oldLum - leftLum) > amount or \
Chapter 7 Image Processing 15
Copyright © 2008
abs(oldLum - bottomLum) > amount:
new.setPixel(x, y, blackPixel)
else:
new.setPixel(x, y, whitePixel)
return new
Figure 7.5 Edge detection: the original image, a luminance threshold of 10, and a
luminance threshold of 20
Reducing an image’s size throws away some of its pixel information. Indeed, the
greater the reduction, the greater the information loss. However, as the image becomes
smaller, the human eye does not normally notice the loss of visual information, and
therefore the quality of the image remains stable to perception.
The results are quite different when an image is enlarged. To increase the size of
an image, we have to add pixels that were not there to begin with. In this case, we try to
approximate the color values that pixels would receive if we took another sample of the
subject at a higher resolution. This process can be very complex, because we also have to
transform the existing pixels to blend in with the new ones that are added. Because the
image gets larger, the human eye is in a better position to notice a degrading of quality
when comparing it to the original. The development of a simple enlargement algorithm
is left as an exercise.
Although we have covered only a tiny subset of the operations typically
performed by an image processing program, these and many more use the same
underlying concepts and principles.
Exercises 7.2
1. Explain the advantages and disadvantages of lossless and lossy image file
compression schemes.
Chapter 7 Image Processing 17
Copyright © 2008
2. The size of an image is 1680 pixels by 1050 pixels. Assume that this image has been
sampled using the RGB color system and placed into a raw image file. What is the
minimum size of this file in megabytes? (Hint: There are 8 bits in a byte, 1024 bits
in a kilobyte, and 1000 kilobytes in a megabyte).
3. Describe the difference between Cartesian coordinates and screen coordinates.
4. Describe how a row-major traversal visits every position in a two-dimensional grid.
5. How would a column-major traversal of a grid work? Write a code segment that
prints the positions visited by a column-major traversal of a 2 by 3 grid.
6. Explain why one would use the clone method with a given object.
7. Why does the blur function need to work with a copy of the original image?
Summary
• Object-based programming uses classes, objects, and methods to solve problems.
• A class specifies a set of attributes and methods for the objects of that class.
• The values of the attributes of a given object comprise its state.
• A new object is obtained by instantiating its class. An object’s attributes receive their
initial values during instantiation.
• The behavior of an object depends on its current state and on the methods that
manipulate this state.
• The set of a class’s methods is called its interface. The interface is what a
programmer needs to know in order to use objects of a class. The information in an
interface usually includes the method headers and documentation about arguments,
return values, and changes of state.
• A class usually includes an __str__ method that returns a string representation of an
object of the class. This string might include information about the object’s current
state. Python’s str function calls this method.
• The RGB system represents a color value by mixing integer components that
represent red, green, and blue intensities. There are 256 different values for each
component, ranging from 0, indicating absence, to 255, indication complete
saturation. There are 224 different combinations of RGB components for 16,777,216
unique colors.
• A grayscale system uses 8, 16, or 256 distinct shades of gray.
• Digital images are captured by sampling analog information from a light source,
using a device such as a digital camera or a flatbed scanner. Each sampled color
value is mapped to a discrete color value among those supported by the given color
system.
• Digital images can be stored in several file formats. A raw image format preserves all
of the sampled color information, but occupies the most storage space. The JPEG
format uses various data compression schemes to reduce the file size while preserving
fidelity to the original samples. Lossless schemes either preserve or reconstitute the
original samples upon decompression. Lossy schemes lose some of the original
sample information. The GIF format is a lossy scheme that uses a palette of up to 256
colors and stores the color information for the image as indexes into this palette.
• During the display of an image file, each color value is mapped onto a pixel in a two-
dimensional grid. The positions in this grid correspond to the screen coordinate
system, in which the upper left corner is at (0, 0), and the lower right corner is at
Chapter 7 Image Processing 18
Copyright © 2008
(width – 1, height – 1).
• A nested loop structure is used to visit each position in a two-dimensional grid. In a
row-major traversal, the outer loop of this structure moves own the rows using the y-
coordinate and the inner loop moves across the columns using the x-coordinate. Each
column in a row is visited before moving to the next row. A column-major traversal
reverses these settings.
• Image manipulation algorithms either transform pixels at given positions or create a
new image using the pixel information of a source image. Examples of the former
type of operation are conversion to black and white and conversion to grayscale.
Blurring, edge detection, and altering the image size are examples of the second type
of operation.
Review Questions
1. The interface of a class is the set of all its
a. objects
b. attributes
c. methods
2. The state of an object consists of
a. its class of origin
b. the values of all of its attributes
c. its physical structure
3. Instantiation is a process that
a. compares two objects for equality
b. builds a string representation of an object
c. creates a new object of a given class
4. The str function
a. creates a new object
b. copies an existing object
c. returns a string representation of an object
5. The clone method
a. creates a new object
b. copies an existing object
c. returns a string representation of an object
6. The origin (0,0) in a screen coordinate system is at
a. the center of a window
b. the upper left corner of a window
7. A row-major traversal of a two-dimensional grid visits all of the positions in a
a. row before moving to the next row
b. column before moving to the next column
8. In a system of 256 unique colors, the number of bits needed to represent each color is
a. 4
b. 8
c. 16
9. In the RGB system, where each color contains three components with 256 possible
values each, the number of bits needed to represent each color is
a. 8
Chapter 7 Image Processing 19
Copyright © 2008
b. 24
c. 256
10. The process whereby analog information is converted to digital information is called
a. recording
b. sampling
c. filtering
d. compressing
1.