Вы находитесь на странице: 1из 54

2.

Data Formats

Chapt. 3
ITEC 1011 Introduction to Information Technologies
Introduction
• Examples
Real World Computer
Data Input device Data

Dear Mom: Keyboard 10110010…

Digital
10110010…
camera
pp. 59.-61
ITEC 1011 Introduction to Information Technologies
Format must be appropriate
• The internal representation must be
appropriate for the type of processing to
take place (e.g., text, images, sound)

ITEC 1011 Introduction to Information Technologies


Rules/Conventions
• Proprietary formats
– Unique to a product or company
– E.g., Microsoft Word, Corel Word Perfect, IBM Lotus
Notes
• Standards
– Evolve two ways:
• Proprietary formats become de facto standards (e.g., Adobe
PostScript, Apple Quick Time)
• Committee is struck to solve a problem (Motion Pictures
Experts Group, MPEG)

pp. 61-62
ITEC 1011 Introduction to Information Technologies
Standards Organizations
• ISO – International Standards Organization
• CSA – Canadian Standards Association
• ANSI – American National Standards
Institute
• IEEE – Institute for Electrical and
Electronics Engineers
• Etc.

ITEC 1011 Introduction to Information Technologies


Examples of Standards
Type of Data Standards
Alphanumeric ASCII, EBCDIC, Unicode

Image JPEG, GIF, PCX, TIFF

Motion picture MPEG-2, Quick Time

Sound Sound Blaster, WAV, AU

Outline graphics/fonts PostScript, TrueType, PDF

ITEC 1011 Introduction to Information Technologies


Why Standards?
• Standard are “arbitrary”
• They exist because they are
– Convenient
– Efficient
– Flexible
– Appropriate
– Etc.

ITEC 1011 Introduction to Information Technologies


Alphanumeric Data
• Problem: Distinguishing between the number 123
(one hundred and twenty-three) and the characters
“123” (one, two, three)
• Four standards for representing letters (alpha) and
numbers
– BCD – Binary-coded decimal
– ASCII – American standard code for information
interchange
– EBCDIC – Extended binary-coded decimal interchange
code
– Unicode
pp. 63-69
ITEC 1011 Introduction to Information Technologies
Standard Alphanumeric Formats
• BCD Next 2 slides

• ASCII
• EBCDIC
• Unicode

ITEC 1011 Introduction to Information Technologies


Binary-Coded Decimal (BCD)
• Four bits per digit Digit Bit pattern
0 0000
Note: the following 1 0001
bit patterns are not 2 0010
used: 3 0011
4 0100
1010
5 0101
1011
1100 6 0110
1101 7 0111
1110 8 1000
1111 9 1001

ITEC 1011 Introduction to Information Technologies


Example
• 709310 = ? (in BCD)

7 0 9 3

0111 0000 1001 0011

ITEC 1011 Introduction to Information Technologies


Standard Alphanumeric Formats
• BCD
• ASCII Next 22 slides

• EBCDIC
• Unicode

ITEC 1011 Introduction to Information Technologies


The Problem
• Representing text strings, such as
“Hello, world”, in a computer

ITEC 1011 Introduction to Information Technologies


Codes and Characters
• Each character is coded as a byte
• Most common coding system is ASCII
(Pronounced ass-key)
• ASCII = American National Standard Code
for Information Interchange
• Defined in ANSI document X3.4-1977

ITEC 1011 Introduction to Information Technologies


ASCII Features
• 7-bit code
• 8th bit is unused (or used for a parity bit)
• 27 = 128 codes
• Two general types of codes:
– 95 are “Graphic” codes (displayable on a
console)
– 33 are “Control” codes (control features of the
console or communications channel)

ITEC 1011 Introduction to Information Technologies


ASCII Chart

000 001 010 011 100 101 110 111


0000 NULL DLE 0 @ P ` p
0001 SOH DC1 ! 1 A Q a q
0010 STX DC2 " 2 B R b r
0011 ETX DC3 # 3 C S c s
0100 EDT DC4 $ 4 D T d t
0101 ENQ NAK % 5 E U e u
0110 ACK SYN & 6 F V f v
0111 BEL ETB ' 7 G W g w
1000 BS CAN ( 8 H X h x
1001 HT EM ) 9 I Y i y
1010 LF SUB * : J Z j z
1011 VT ESC + ; K [ k {
1100 FF FS , < L \ l |
1101 CR GS - = M ] m }
1110 SO RS . > N ^ n ~
1111 SI US / ? O _ o DEL

ITEC 1011 Introduction to Information Technologies


000 001 010 011 100 101 110 111
0000 NULL DLE 0 @ P ` p
0001 SOH DC1 ! 1 A Q a q
0010 STX DC2 " 2 B R b r
0011 ETX DC3 # 3 C S c s
0100 EDT DC4 $ 4 D T d t
0101 ENQ NAK % 5 E U e u
0110 ACK SYN & 6 F V f v
0111 BEL ETB ' 7 G W g w
1000 BS CAN ( 8 H X h x
1001 HT EM ) 9 I Y i y
1010 LF SUB * : J Z j z
1011 VT ESC + ; K [ k {
1100 FF FS , < L \ l |
1101 CR GS - = M ] m }
1110 SO RS . > N ^ n ~
1111 SI US / ? O _ o DEL

ITEC 1011 Introduction to Information Technologies


000 001 010 011 100 101 110 111
0000 NULL DLE 0 @ P ` p
0001 SOH DC1 ! 1 A Q a q
0010 STX DC2 " 2 B R b r
0011 ETX DC3 # 3 C S c s
0100 EDT DC4 Most$ significant
4 bit D T d t
0101 ENQ NAK % 5 E U e u
0110 ACK SYN & 6 F V f v
0111 BEL ETB ' 7 G W g w
1000 BS CAN ( 8 H X h x
1001 HT EM ) 9 I Y i y
1010 LF SUB * : J Z j z
1011
Least VT ESC
significant bit + ; K [ k {
1100 FF FS , < L \ l |
1101 CR GS - = M ] m }
1110 SO RS . > N ^ n ~
1111 SI US / ? O _ o DEL

ITEC 1011 Introduction to Information Technologies


e.g., ‘a’ = 1100001

000 001 010 011 100 101 110 111


0000 NULL DLE 0 @ P ` p
0001 SOH DC1 ! 1 A Q a q
0010 STX DC2 " 2 B R b r
0011 ETX DC3 # 3 C S c s
0100 EDT DC4 $ 4 D T d t
0101 ENQ NAK % 5 E U e u
0110 ACK SYN & 6 F V f v
0111 BEL ETB ' 7 G W g w
1000 BS CAN ( 8 H X h x
1001 HT EM ) 9 I Y i y
1010 LF SUB * : J Z j z
1011 VT ESC + ; K [ k {
1100 FF FS , < L \ l |
1101 CR GS - = M ] m }
1110 SO RS . > N ^ n ~
1111 SI US / ? O _ o DEL

ITEC 1011 Introduction to Information Technologies


95 Graphic codes
000 001 010 011 100 101 110 111
0000 NULL DLE 0 @ P ` p
0001 SOH DC1 ! 1 A Q a q
0010 STX DC2 " 2 B R b r
0011 ETX DC3 # 3 C S c s
0100 EDT DC4 $ 4 D T d t
0101 ENQ NAK % 5 E U e u
0110 ACK SYN & 6 F V f v
0111 BEL ETB ' 7 G W g w
1000 BS CAN ( 8 H X h x
1001 HT EM ) 9 I Y i y
1010 LF SUB * : J Z j z
1011 VT ESC + ; K [ k {
1100 FF FS , < L \ l |
1101 CR GS - = M ] m }
1110 SO RS . > N ^ n ~
1111 SI US / ? O _ o DEL

ITEC 1011 Introduction to Information Technologies


33 Control codes
000 001 010 011 100 101 110 111
0000 NULL DLE 0 @ P ` p
0001 SOH DC1 ! 1 A Q a q
0010 STX DC2 " 2 B R b r
0011 ETX DC3 # 3 C S c s
0100 EDT DC4 $ 4 D T d t
0101 ENQ NAK % 5 E U e u
0110 ACK SYN & 6 F V f v
0111 BEL ETB ' 7 G W g w
1000 BS CAN ( 8 H X h x
1001 HT EM ) 9 I Y i y
1010 LF SUB * : J Z j z
1011 VT ESC + ; K [ k {
1100 FF FS , < L \ l |
1101 CR GS - = M ] m }
1110 SO RS . > N ^ n ~
1111 SI US / ? O _ o DEL

ITEC 1011 Introduction to Information Technologies


Alphabetic codes

000 001 010 011 100 101 110 111


0000 NULL DLE 0 @ P ` p
0001 SOH DC1 ! 1 A Q a q
0010 STX DC2 " 2 B R b r
0011 ETX DC3 # 3 C S c s
0100 EDT DC4 $ 4 D T d t
0101 ENQ NAK % 5 E U e u
0110 ACK SYN & 6 F V f v
0111 BEL ETB ' 7 G W g w
1000 BS CAN ( 8 H X h x
1001 HT EM ) 9 I Y i y
1010 LF SUB * : J Z j z
1011 VT ESC + ; K [ k {
1100 FF FS , < L \ l |
1101 CR GS - = M ] m }
1110 SO RS . > N ^ n ~
1111 SI US / ? O _ o DEL

ITEC 1011 Introduction to Information Technologies


Numeric codes

000 001 010 011 100 101 110 111


0000 NULL DLE 0 @ P ` p
0001 SOH DC1 ! 1 A Q a q
0010 STX DC2 " 2 B R b r
0011 ETX DC3 # 3 C S c s
0100 EDT DC4 $ 4 D T d t
0101 ENQ NAK % 5 E U e u
0110 ACK SYN & 6 F V f v
0111 BEL ETB ' 7 G W g w
1000 BS CAN ( 8 H X h x
1001 HT EM ) 9 I Y i y
1010 LF SUB * : J Z j z
1011 VT ESC + ; K [ k {
1100 FF FS , < L \ l |
1101 CR GS - = M ] m }
1110 SO RS . > N ^ n ~
1111 SI US / ? O _ o DEL

ITEC 1011 Introduction to Information Technologies


Punctuation, etc.

000 001 010 011 100 101 110 111


0000 NULL DLE 0 @ P ` p
0001 SOH DC1 ! 1 A Q a q
0010 STX DC2 " 2 B R b r
0011 ETX DC3 # 3 C S c s
0100 EDT DC4 $ 4 D T d t
0101 ENQ NAK % 5 E U e u
0110 ACK SYN & 6 F V f v
0111 BEL ETB ' 7 G W g w
1000 BS CAN ( 8 H X h x
1001 HT EM ) 9 I Y i y
1010 LF SUB * : J Z j z
1011 VT ESC + ; K [ k {
1100 FF FS , < L \ l |
1101 CR GS - = M ] m }
1110 SO RS . > N ^ n ~
1111 SI US / ? O _ o DEL

ITEC 1011 Introduction to Information Technologies


“Hello, world” Example

Binary Hexadecimal Decimal


H = 01001000 = 48 = 72
e = 01100101 = 65 = 101
l = 01101100 = 6C = 108
l = 01101100 = 6C = 108
o = 01101111 = 6F = 111
, = 00101100 = 2C = 44
= 00100000 = 20 = 32
w = 01110111 = 77 = 119
o = 01100111 = 67 = 103
r = 01110010 = 72 = 114
l = 01101100 = 6C = 108
d = 01100100 = 64 = 100

ITEC 1011 Introduction to Information Technologies


Common Control Codes
• CR 0D carriage return
• LF 0A line feed
• HT 09 horizontal tab
• DEL 7F delete
• NULL 00 null

Hexadecimal code

ITEC 1011 Introduction to Information Technologies


000 001 010 011 100 101 110 111
0000 NULL DLE 0 @ P ` p
0001 SOH DC1 ! 1 A Q a q
0010 STX DC2 " 2 B R b r
0011 ETX DC3 # 3 C S c s
0100 EDT DC4 $ 4 D T d t
0101 ENQ NAK % 5 E U e u
0110 ACK SYN & 6 F V f v
0111 BEL ETB ' 7 G W g w
1000 BS CAN ( 8 H X h x
1001 HT EM ) 9 I Y i y
1010 LF SUB * : J Z j z
1011 VT ESC + ; K [ k {
1100 FF FS , < L \ l |
1101 CR GS - = M ] m }
1110 SO RS . > N ^ n ~
1111 SI US / ? O _ o DEL

ITEC 1011 Introduction to Information Technologies


Terminology
• Learn the names of the special symbols
– [] brackets
– {} braces
– () parentheses
– @ commercial ‘at’ sign
– & ampersand
– ~ tilde

ITEC 1011 Introduction to Information Technologies


000 001 010 011 100 101 110 111
0000 NULL DLE 0 @ P ` p
0001 SOH DC1 ! 1 A Q a q
0010 STX DC2 " 2 B R b r
0011 ETX DC3 # 3 C S c s
0100 EDT DC4 $ 4 D T d t
0101 ENQ NAK % 5 E U e u
0110 ACK SYN & 6 F V f v
0111 BEL ETB ' 7 G W g w
1000 BS CAN ( 8 H X h x
1001 HT EM ) 9 I Y i y
1010 LF SUB * : J Z j z
1011 VT ESC + ; K [ k {
1100 FF FS , < L \ l |
1101 CR GS - = M ] m }
1110 SO RS . > N ^ n ~
1111 SI US / ? O _ o DEL

ITEC 1011 Introduction to Information Technologies


Escape Sequences
• Extend the capability of the ASCII code set
• For controlling terminals and formatting output
• Defined by ANSI in documents X3.41-1974 and
X3.64-1977
• The escape code is ESC = 1B16
• An escape sequence begins with two codes:

ESC [

1B16 5B16
ITEC 1011 Introduction to Information Technologies
Examples
• Erase display: ESC [ 2 J
• Erase line: ESC [ K

ITEC 1011 Introduction to Information Technologies


Standard Alphanumeric Formats
• BCD
• ASCII
• EBCDIC Next 1 slides

• Unicode

ITEC 1011 Introduction to Information Technologies


EBCDIC
• Extended BCD Interchange Code
(pronounced ebb’-se-dick)
• 8-bit code
• Developed by IBM
• Rarely used today
• IBM mainframes only

ITEC 1011 Introduction to Information Technologies


Standard Alphanumeric Formats
• BCD
• ASCII
• EBCDIC
• Unicode Next 2 slides

ITEC 1011 Introduction to Information Technologies


Unicode
• 16-bit standard
• Developed by a consortia
• Intended to supercede older 7- and 8-bit
codes

ITEC 1011 Introduction to Information Technologies


Unicode Version 2.1
• 1998
• Improves on version 2.0
• Includes the Euro sign (20AC16 = )
• From the standard:
…contains 38,887 distinct coded characters derived
from the supported scripts. These characters cover the
principal written languages of the Americas, Europe,
the Middle East, Africa, India, Asia, and Pacifica.

http://www.unicode.org
ITEC 1011 Introduction to Information Technologies
Keyboard Input
• Key (“scan”) codes are converted to ASCII
• ASCII code sent to host computer
• Received by the host as a “stream” of data
• Stored in buffer
• Processed
• Etc.

pp. 69
ITEC 1011 Introduction to Information Technologies
Shift Key
• inhibits bit 5 in the ASCII code
ASCII code
Key(s) 6 5 4 3 2 1 0 Character

a 1 1 0 0 0 0 1 a

Shift a 1 0 0 0 0 0 1 A

ITEC 1011 Introduction to Information Technologies


Control Key
• inhibits bits 5 & 6 in the ASCII code
ASCII code
Key(s) 6 5 4 3 2 1 0 Character

c 1 1 0 0 0 1 1 c

Ctrl c 0 0 0 0 0 1 1 ETX
Control
code
ITEC 1011 Introduction to Information Technologies
Other Input
• OCR – optical character recognition
• Bar code readers
• Voice/audio input
• Punched cards
• Images / objects
• Pointing devices

pp. 69-86
ITEC 1011 Introduction to Information Technologies
OCR

Hello, world
Optical scan 10110110…

Page of text Computer file

ITEC 1011 Introduction to Information Technologies


Other Input
• OCR – optical character recognition
• Bar code readers
• Voice/audio input
• Punched cards
• Images / objects
• Pointing devices

pp. 69-86
ITEC 1011 Introduction to Information Technologies
Bar Codes
• An automatic identification (Auto ID)
technology that streamlines identification
and data collection
• See
http://www.digital.net/barcoder/barcode.html

ITEC 1011 Introduction to Information Technologies


Other Input
• OCR – optical character recognition
• Bar code readers
• Voice/audio input
• Punched cards
• Images / objects
• Pointing devices

pp. 69-86
ITEC 1011 Introduction to Information Technologies
Voice/audio Input
• Input device: microphone
• Audio input is “digitized” and stored
• Processed in two ways
– As is (no recognition)
– Recognized and converted to alphanumeric data
(ASCII)

Digitize 10110010…

ITEC 1011 Introduction to Information Technologies


Other Input
• OCR – optical character recognition
• Bar code readers
• Voice/audio input
• Punched cards
• Images / objects
• Pointing devices

pp. 69-86
ITEC 1011 Introduction to Information Technologies
Punched Cards
• Invented by Herman Hollerith (founder of
IBM)
• Each card holds 80 characters

ITEC 1011 Introduction to Information Technologies


Other Input
• OCR – optical character recognition
• Bar code readers
• Voice/audio input
• Punched cards
• Images / objects
• Pointing devices

pp. 69-86
ITEC 1011 Introduction to Information Technologies
Images
• Typically images are pictures that are
optically scanned and saved as a “bit map”
or in some other format
• Many formats
– gif, jpeg, …

ITEC 1011 Introduction to Information Technologies


Typical “Save As” Dialog

ITEC 1011 Introduction to Information Technologies


Objects
• Images made of geometrically definable
shapes
• Offer efficiency, flexibility, small size, etc.

ITEC 1011 Introduction to Information Technologies


Other Input
• OCR – optical character recognition
• Bar code readers
• Voice/audio input
• Punched cards
• Images / objects
• Pointing devices

pp. 69-86
ITEC 1011 Introduction to Information Technologies
Pointing Devices
• Originally used for specifying coordinates
(x, y) for graphical input
• Today used as general purpose device for
“graphical user interfaces” (GUIs)

ITEC 1011 Introduction to Information Technologies


Thank you

ITEC 1011 Introduction to Information Technologies

Вам также может понравиться