Image and Sound Representation

Image and sound representation
Pixels - Numbers and text are stored in computer memory in binary. In this section we will learn how images (photographs and graphic images) can be stored in binary as well. To understand how the binary system is used to represent digital images (images stored in binary) requires some background on how computer displays operate. All computer screens,
monitors and television screens are made up of pixels -- tiny picture dots laid out in a 2dimensional grid. Resolution - One of the factors affecting the quality of a digital image is its resolution. The resolution of an image is the number of pixels used to represent the image. Or to think of it another way, to store an image, like a photograph, digitally, the computer superimposes on the photograph a grid and only stores one color of information for each square of the grid. Obviously a grid with a lot of little squares will give a better picture than a grid with a few big squares. The density of the squares in the grid is the resolution. You may hear resolution talked about as "dots per inch" (dpi) as in "72 dpi" or as a grid size as in 640 X 480 pixels. Bitmaps vs Vectors - There are two main formats in which images are stored in the computer - bitmap (or raster) and vector. With a bitmap, information is stored about each pixel of the image. With a vector representation, all components of the image are stored using mathematical formulas. A vector-based image can easily be scaled (made larger or smaller) without losing image quality whereas a bitmap cannot. On the other hand, with image-editing programs, you can edit individual pixels of a bitmap images. In this module we will discuss only bitmap images. Storing images - With a bitmap image, the image is stored in memory by recording some piece of binary data for each pixel of the image. Recall that each pixel contains one "dot" of color - it has only one color. In order to store data for that pixel in memory, the information about that pixel's color must somehow be converted to a binary number. There are various ways to do this with some methods requiring more bits than others. The number of bits you store with each pixel affects both the image quality and the amount of memory that will be needed to store the image. Storing bitmaps - One way to store an image would be to color each pixel either black or white. Then each pixel could be stored using only 1 bit of data - either a 1 for black or a 0 for white. A good example is the letter 'A' shown at the left. Note how the arrangement of the black and white pixels can create a bitmap representing a letter. Grayscale images - Another type of image is a grayscale image in which each pixel contains a shade of gray. This would be the representation used for black and white photographs. If the grayscale has 256 shades of gray, then for each pixel we need to store a binary number indicating one of 256 shades of gray. As always, the grayscale pixel information will be stored in memory, perhaps row by row. How will we convert the gray
-1-
value - a number between 0 and 255 - to binary? How many bits will it take? If you recall from Module 2, with 8 bits we can represent unsigned binary numbers between 0 and 255. Therefore each pixel in a grayscale image can be stored using 1 byte (8 bits). Color image formats - Of course, most images on the computer are not black and white, but are in color. How do we store data about colored images? Again, each pixel must have a single color and we will somehow need to store that color in binary. There are various ways to do this - 16 color, 256 color, and RGB color (or true color). Each of the different ways requires different numbers of bits and produces different image quality. We will see how these different formats work in the next few slides. Storing color images - With a 16 color image, every pixel has one of 16 colors. In graphic editing programs, you can specify which 16 colors to use, tailoring the colors to your particular image. With only 16 possibilities, each pixel stores a value from 0 to 15, which can be done with 4 bits. With a 256 color image, every pixel has one of 256 colors; 256 different values can be stored in 8 bits. RGB color - The format that is used most often and has the best color quality is RGB color, which can represent 16.7 million different colors. Red, Green, and Blue are "additive colors". If we combine red, green and blue light you will get white light. This is the principal behind the T.V. set in your living room and the monitor you are staring at now. Additive color, or RGB mode, is optimized for display on computer monitors and peripherals, most notably scanning devices. With RGB colors, the color in each pixel is considered to be a mixture of shades of red, green, and blue, where the amount of each color is specified as a number between 0 and 255. We talk about RGB colors using 3 numbers between 0 and 255, where 0 means no color and 255 represents the highest amount of that color. For example, (255,255,0) means we have a 255 value for red (which means maximum red), a 255 value for green, and a 0 value for blue. This color is yellow. Shades of purple would be mixtures of red and blue with no green, for example (127, 0, 93) should give a bluish purple since it has 127 red, 0 green, and 93 blue. To handle all these colors, color displays are based on the TrueColor RGB values method, which means various degrees of the red, green and blue are "mixed" together to produce the color for a specific pixel. A byte of information (256 variations) is needed to track the percentage of red, another byte is needed for the green and another byte for the blue. This means that 3 bytes of information are needed for each display pixel. It also means over 16.7 different million colors can be represent, which is more than the eye can distinguish. This method for specifying colors is used in web programming. To store an RGB color image, 3 bytes (24 bits) are needed for each pixel, so these files can become large. For comparison, a 16 by 16 image (256 pixels) would require 768 bytes of storage (256 pixels X 3 bytes = 768 bytes). A comparable image in 256 color and
-2-
16 color would require 256 bytes and 128 bytes of memory respectively. CMYK - Cyan, Magenta and Yellow are "subtractive colors". If we print cyan, magenta and yellow inks on white paper, they absorb the light shining on the page. Since our eyes receive no reflected light from the paper, we perceive black... in a perfect world! The printing world operates in subtractive color, or CMYK mode. In practice, printing subtractive inks may contain impurities that prevent them from absorbing light perfectly. They do a pretty good job with light colors, but when we add them all together, they produce a murky brown rather than black. In order to get decent dark colors, black ink is added in increasing proportions, as the color gets darker and darker. This is the "K" component in CMYK printing. "K" is used to indicate black instead of a "B" to avoid possible confusion over Blue ink. Color Modes RGB is one of the two most widely used color modes and is used mainly for display on monitors. CMYK however is the only right color mode for print. File sizes and compression - With a requirement of three bytes of binary data to describe each pixel (each pixel is defined by the amount of red, green and blue components), and a high-resolution screen (1600 by 1200 pixels), it doesn't take long to see how much memory is needed to display a single fullscreen image --- a minimum of 5.8 Megabytes (5.76 million bytes). With those kind of storage requirements images could be a problem for computers, but this has been addressed through file compression techniques, which are methods that reduce image memory requirements without sacrificing significant picture quality. Compression formats - Two of the more well known compression methods are JPEG and GIF. Photographs are better compressed using JPEG; art with areas of similar colors are better compressed as GIF's. Graphic editing software allows images to be saved in either format. A JPEG file will have a name ending in ".jpg" and a GIF file will have a name ending in ".gif." An image for the web has usually been compressed using JPEG or GIF. There are different levels of compression - more compression means a smaller file, but after some point the picture quality is degraded. Introduction to the sound - Understanding how the binary system is used to represent sound requires a brief overview of the physical nature of sound. In simple terms, when an object, an instrument or a person vibrates (especially by speaking) it changes the surrounding air pressure and creates a series of waves, which the brain interprets as sounds. Sound waves forms - This is an electrical representation of a sound wave known as an analog signal. The parts of the analog signal are shown here. It is not critical to know all the parts of the analog signal, but it is important to realize that these waveforms are converted into a digital format of 1's and 0's. The amplitude as it is defined in context of sound is the height at any point of the wave form as it is related to the zero line (the standard signal level). The amplitude can be
-3-
computed for waves below the zero line as well as above the zero line. It is measured in electrical voltages. Frequency is the number of cycles per second measured in Hertz (Hz). Taking measurements - A microphone translates the change in air pressure and converts it to a wave form. A converter within the sound card of the computer takes readings each second. These readings are positions (voltages, actually) on the wave in relation to the zero line. They are recorded and the converted from decimal to binary numbers. Once they are converted, the waveforms can be transformed into one of several audio formats such as WAV (file extension .wav) or AU (file extension .au). For a better understanding, press Play to see the three-step animation of how these readings are recorded. Sampling - the process of measuring and recording the voltage of the signal is called sampling and there are two main factors that determine its accuracy. One factor is how often the wave is measured called the sampling rate and the other is taking a more accurate measurement known as sampling precision. Obviously, the more readings taken with a more precise unit of measurement, results in a more accurate the conversion (and eventually better sound reproduction) Sampling rate - In order for a CD quality recording to be made, the analog signal must be read approximately 44,000 times a second. That is a considerable amount of binary information being generated. If the sampling rate drops below this level, the human ear detects distortions in the recording. As CDs are made 0s are represented by tiny pits in the disc that scatter light which will be interpreted by a light-sensing diode as a zero. The unpitted surface reflects light and thus will be interpreted as a 1. Sampling precision - In addition to the sampling rate, the more bits (or bytes) dedicated to taking the voltage reading the better the sound reproduction. This is similar to the experience we had with converting an image to binary. When we used more bytes to represent colors we had exponentially more colors in our palette. The same principle is true with sound, 2 bytes of memory gives us (2 to the power of 16) or 65,536 possible levels of measurement, as compared 1 byte of sound, which could represent only 256 levels of measurement. For a CD recording, 2 bytes of memory are used in the precision sampling. A DVD uses 3 bytes of memory. Music compression - Likewise as with images, recording sounds (and music) require significant amounts of memory. But again, there are data compression techniques that reduce file size without sacrificing quality. In audio the most common file compression technique is the MP3 file format, which reduces a sound file by approximately a factor of 10. If 176,000 bytes are required to record 1 second of CD Quality sound, then a 4 minute song would require approximately 40 MB of storage. On the other hand, using MP3 compression the storage for this same song would reduce the memory needs to only about 4 MB.
-4-
This makes it much easier to fit music onto a portable listening device and has led to a dramatic change of how people purchase and listen to music. MPEG-1 or MPEG-2 Audio Layer 3 more commonly referred to as MP3, is a patented digital audio encoding format using a form of lossy data compression. It is a common audio format for consumer audio storage, as well as a de facto standard of digital audio compression for the transfer and playback of music on digital audio players. MP3 is an audio-specific format that was designed by the Moving Picture Experts Group as part of its MPEG-1 standard and later extended in MPEG-2 standard. The use in MP3 of a lossy compression algorithm is designed to greatly reduce the amount of data required to represent the audio recording and still sound like a faithful reproduction of the original uncompressed audio for most listeners. An MP3 file that is created using the setting of 128 kbit/s will result in a file that is about 11 times smaller than the CD file created from the original audio source. An MP3 file can also be constructed at higher or lower bit rates, with higher or lower resulting quality. The compression works by reducing accuracy of certain parts of sound that are deemed beyond the auditory resolution ability of most people. This method is commonly referred to as perceptual coding. It uses psychoacoustic models to discard or reduce precision of components less audible to human hearing, and then records the remaining information in an efficient manner. When performing lossy audio encoding, such as creating an MP3 file, there is a trade-off between the amount of space used and the sound quality of the result. Typically, the creator is allowed to set a bit rate, which specifies how many kilobits the file may use per second of audio. The higher the bit rate, the larger the compressed file will be, and, generally, the closer it will sound to the original file. With too low a bit rate, compression artifacts (i.e. sounds that were not present in the original recording) may be audible in the reproduction. Some audio is hard to compress because of its randomness and sharp attacks. When this type of audio is compressed, artifacts such as ringing or pre-echo are usually heard. A sample of applause compressed with a relatively low bit rate provides a good example of compression artifacts. Besides the bit rate of an encoded piece of audio, the quality of MP3 files also depends on the quality of the encoder itself, and the difficulty of the signal being encoded. As the MP3 standard allows quite a bit of freedom with encoding algorithms, different encoders may feature quite different quality, even with identical bit rates. As an example, in a public listening test featuring two different MP3 encoders at about 128 kbit/s, one scored 3.66 on a 15 scale, while the other scored only 2.22. Quality is dependent on the choice of encoder and encoding parameters. However, in 1998, MP3 at 128 kbit/s was providing lower quality than AAC at 96 kbit/s. Resources: http://www.cs.utk.edu/modules/; http://en.wikipedia.org/wiki/Mp3
-5-
-6-

Image and Sound Representation

Загружено:

Сведения о документе

Исходное описание:

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Image and Sound Representation

Загружено:

Авторское право:

Доступные форматы

Image and sound representation

Вам также может понравиться