You are on page 1of 6

ISO/IEC 8859-1

ISO/IEC 8859-1
ISO/IEC 8859-1:1998
MIME Alias(es) ISO-8859-1 iso-ir-100, csISOLatin1, latin1, l1, IBM819, CP819

Standard ISO/IEC 8859 v t [1]

ISO/IEC 8859-1:1998, Information technology 8-bit single-byte coded graphic character sets Part 1: Latin alphabet No. 1, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1987. It is generally intended for Western European languages (see below for a list). It is the basis for most popular 8-bit character sets, including Windows-1252 and the first block of characters in Unicode. ISO-8859-1 is the IANA preferred name for this standard when supplemented with the C0 and C1 control codes from ISO/IEC 6429. The following other aliases are registered for ISO-8859-1: iso-ir-100, csISOLatin1, latin1, l1, IBM819, CP819. The Windows-1252 codepage coincides with ISO-8859-1 for all codes except the range 128 to 159 (hex 80 to 9F), where the little-used C1 controls are replaced with additional characters including all the missing characters provided by ISO-8859-15. Code page 28591 aka Windows-28591 is the actual ISO-8859-1 codepage.

Coverage
ISO 8859-1 encodes what it refers to as "Latin alphabet no. 1," consisting of 191 characters from the Latin script. This character-encoding scheme is used throughout the Americas, Western Europe, Oceania, and much of Africa. It is also commonly used in most standard romanizations of East-Asian languages. Each character is encoded as a single eight-bit code value. These code values can be used in almost any data interchange system to communicate in the following European languages (with a few exceptions due to missing characters, as noted):

Languages with complete coverage


Afrikaans Albanian Basque Breton Catalan Corsican Danish Faroese Galician German Icelandic Indonesian Irish (new orthography) Italian Latin (basic classical orthography) Leonese Malay Manx Norwegian (Bokml and Nynorsk) Occitan Portuguese Rhaeto-Romanic Scottish Gaelic Spanish Swahili Swedish Walloon

English (UK and US)

Luxembourgish (basic classical orthography)

ISO/IEC 8859-1

Languages commonly supported but with incomplete coverage


Language Catalan Czech Dutch Estonian Missing characters , (deprecated) , , , , , , , , ch , , , , (only present in loanwords) , , , (only present in loanwords) , , and the very rare , , , L, l digraph ch digraphs IJ, ij Sh, sh, Zh, zh ISO-8859-15, Windows-1252 ISO-8859-2, Windows-1250 Typical workaround Supported by

Finnish

Sh, sh, Zh, zh

ISO-8859-15, Windows-1252

French Hungarian

digraphs OE, oe, and Y without the diaeresis

ISO-8859-15, Windows-1252

, (or , ; sometimes , ), , (sometimes ISO-8859-2, Windows-1250 , ) Bh, bh, Ch, ch, Dh, dh, Fh, fh, Gh, gh, Mh, mh, Sh, sh, Th, th ISO-8859-14

Irish (traditional orthography) Latin with macrons Mori Turkish

, , , , , , , , , , , , , -, , , , , , , , , , , , , , , , , , , , , , , , ,

ISO-8859-13, Windows-1257 , , , , , , , , , I, i, G, g, S, s ISO-8859-13, Windows-1257 ISO-8859-3, ISO-8859-9, Windows-1254 ISO-8859-14

Welsh

, , , , , , ,

Quotation marks
For some languages listed above the correct typographical quotation marks are missing, as only , " ", and ' ' are included. Also this scheme does not provide for oriented (6- or 9-shaped) single or double quotation marks. Some fonts will display the spacing grave accent (0x60) and the apostrophe (0x27) as a matching pair of oriented single quotation marks, however this is not considered part of the modern standard.

History
ISO 8859-1 was based on the Multinational Character Set used by Digital Equipment Corporation in the popular VT220 terminal. It was developed within ECMA, the European Computer Manufacturers Association, and published in March 1985 as ECMA-94, by which name it is still sometimes known. The second edition of ECMA-94 [2] (June 1986) also included ISO 8859-2, ISO 8859-3, and ISO 8859-4 as part of the specification. In 1985 Commodore adopted ISO 8859-1 for its new AmigaOS operating system. The Seikosha MP-1300AI impact dot-matrix printer, used with the Amiga 1000, included this encoding. [citation needed] In 1992, the IANA registered the character map ISO_8859-1:1987, more commonly known by its preferred MIME name of ISO-8859-1 (note the extra hyphen over ISO 8859-1), a superset of ISO 8859-1, for use on the Internet. This map assigns the C0 and C1 control characters to the unassigned code values thus provides for 256 characters via every possible 8-bit value. ISO-8859-1 is (according to the standards at least) the default encoding of documents delivered via HTTP with a MIME type beginning with "text/" (however the draft HTML 5 specification requires that documents advertised as ISO-8859-1 actually be parsed with the Windows-1252 encoding.[3]) It is the default encoding of the values of certain descriptive HTTP headers, and defines the repertoire of characters allowed in HTML 3.2 documents (HTML 4.0, however, is based on Unicode). It and Windows-1252 are often assumed to be the encoding of text on Unix and Microsoft Windows in the absence of locale or other information, this is only gradually being replaced with Unicode

ISO/IEC 8859-1 encoding such as UTF-8 or UTF-16.

Codepage layout
ISO/IEC 8859-1 _0 0_ 1_ 2_ SP ]] [[Exclamation " 0020 0021 0022 32 mark 33 34 _1 _2 _3 _4 _5 _6 _7 _8 _9 _A _B _C _D _E _F

#
0023 35

$
0024 36

%
0025 37

&
0026 38

'
0027 39

(
0028 40

)
0029 41

*
002A 42

+
002B 43

,
002C 44

002D 45

.
002E 46

/
002F 47

3_

0
0030 48

1
0031 49

2
0032 50

3
0033 51

4
0034 52

5
0035 53

6
0036 54

7
0037 55

8
0038 56

9
0039 57

:
003A 58

;
003B 59

<
003C 60

=
003D 61

>
003E 62

?
003F 63

4_

@ A
0040 64 0041 65

B
0042 66

C
0043 67

D
0044 68

E
0045 69

F
0046 70

G
0047 71

H
0048 72

I
0049 73

J
004A 74

K
004B 75

L
004C 76

M
004D 77

N
004E 78

O
004F 79

5_

P
0050 80

Q
0051 81

R
0052 82

S
0053 83

T
0054 84

U
0055 85

V
0056 86

W
0057 87

X
0058 88

Y
0059 89

Z
005A 90

[
005B 91

\
005C 92

]
005D 93

^
005E 94

_
005F 95

6_

`
0060 96

a
0061 97

b
0062 98

c
0063 99

d
0064 100

e
0065 101

f
0066 102

g
0067 103

h
0068 104

i
0069 105

j
006A 106

k
006B 107

l
006C 108

m
006D 109

n
006E 110

o
006F 111

7_

p
0070 112

q
0071 113

r
0072 114

s
0073 115

t
0074 116

u
0075 117

v
0076 118

w
0077 119

x
0078 120

y
0079 121

z
007A 122

{
007B 123

|
007C 124

}
007D 125

~
007E 126

8_ 9_ A_ NBSP 00A0 160 B_

00A1 161

00A2 162

00A3 163

00A4 164

00A5 165

00A6 166

00A7 167

00A8 168

00A9 169

00AA 170

00AB 171

00AC 172

SHY 00AD 173

00AE 174

00AF 175

00B0 176

00B1 177

00B2 178

00B3 179

00B4 180

00B5 181

00B6 182

00B7 183

00B8 184

00B9 185

00BA 186

00BB 187

00BC 188

00BD 189

00BE 190

00BF 191

C_

00C0 192

00C1 193

00C2 194

00C3 195

00C4 196

00C5 197

00C6 198

00C7 199

00C8 200

00C9 201

00CA 202

00CB 203

00CC 204

00CD 205

00CE 206

00CF 207

D_

00D0 208

00D1 209

00D2 210

00D3 211

00D4 212

00D5 213

00D6 214

00D7 215

00D8 216

00D9 217

00DA 218

00DB 219

00DC 220

00DD 221

00DE 222

00DF 223

E_

00E0 224

00E1 225

00E2 226

00E3 227

00E4 228

00E5 229

00E6 230

00E7 231

00E8 232

00E9 233

00EA 234

00EB 235

00EC 236

00ED 237

00EE 238

00EF 239

F_

00F0 240

00F1 241

00F2 242

00F3 243

00F4 244

00F5 245

00F6 246

00F7 247

00F8 248

00F9 249

00FA 250

00FB 251

00FC 252

00FD 253

00FE 254

00FF 255

ISO/IEC 8859-1

4
_2 _3 _4 _5 _6 _7 _8 _9 _A _B _C _D _E _F

_0

_1

Similar character sets


ISO-8859-1 was incorporated as the first 256 code points of ISO/IEC 10646 and Unicode. The lower range 32 to 126 (hex 20 to 7E, the G0 subset) maps exactly to the same coded G0 subset of the ISO646 US variant (commonly known as ASCII), whose ISO2022 standard switch sequence is "ESC ( B". The higher range 160 to 255 (hex A0 to FF, the G1 subset) maps exactly to the same subset initiated by the ISO2022 standard switch sequence "ESC . A". ISO/IEC 8859-1 is missing some characters for French and Finnish text and the euro sign. In order to provide some of these characters, ISO/IEC 8859-15 was developed as an update of ISO/IEC 8859-1. This required, however, the removal of some infrequently used characters from ISO/IEC 8859-1, including fraction symbols and letter-free diacritics: , , , , , , , and . The popular Windows-1252 character set adds all the missing characters provided by ISO/IEC 8859-15, plus a number of typographic symbols, by replacing the rarely used C1 controls in the range 128 to 159 (hex 80 to 9F). It is very common to mislabel text data with the charset label ISO-8859-1, even though the data is really Windows-1252 encoded. Many web browsers and e-mail clients will interpret ISO-8859-1 control codes as Windows-1252 characters in order to accommodate such mislabeling but it is not standard behaviour and care should be taken to avoid generating these characters in ISO-8859-1 labeled content. The Apple Macintosh computer introduced a character encoding called Mac Roman, or Mac-Roman, in 1984. It was meant to be suitable for Western European desktop publishing. It is a superset of ASCII, like ISO-8859-1, and has most of the characters that are in ISO-8859-1 but in a totally different arrangement. A later version, registered with IANA as "Macintosh", replaced the generic currency sign with the euro sign . The few printable characters that are in ISO 8859-1 but not in this set are often a source of trouble when editing text on websites using older Macintosh browsers (including the last version of Internet Explorer for Mac). However the extra characters that Windows-1252 has in the C1 codepoint range are all supported in MacRoman. DOS had code page 850, which had all printable characters that ISO-8859-1 had (albeit in a totally different arrangement) plus the most widely used graphic characters from code page 437.

References
[1] http:/ / en. wikipedia. org/ w/ index. php?title=Template:Infobox_character_encoding& action=edit [2] http:/ / www. ecma-international. org/ publications/ files/ ECMA-ST/ Ecma-094. pdf [3] HTML 5 Draft Recommendation 12 April 2010, 8.1 Character encodings (http:/ / dev. w3. org/ html5/ spec/ Overview. html#character-encodings-0), retrieved [2010-04-12].

External links
ISO/IEC 8859-1:1998 (http://www.iso.org/iso/en/CatalogueDetailPage. CatalogueDetail?CSNUMBER=28245&ICS1=35&ICS2=40&ICS3=) ISO/IEC 8859-1:1998 (ftp://std.dkuug.dk/JTC1/sc2/wg3/docs/n411.pdf) - 8-bit single-byte coded graphic character sets, Part 1: Latin alphabet No. 1 (draft dated February 12, 1998, published April 15, 1998) Standard ECMA-94 (http://www.ecma-international.org/publications/standards/Ecma-094.htm): 8-Bit Single Byte Coded Graphic Character Sets - Latin Alphabets No. 1 to No. 4 2nd edition (June 1986) ISO-IR 100 (http://www.itscj.ipsj.or.jp/ISO-IR/100.pdf) Right-Hand Part of Latin Alphabet No.1 (February 1, 1986) Windows Code pages (http://msdn.microsoft.com/goglobal/bb964656)

ISO/IEC 8859-1 Differences between ANSI, ISO-8859-1 and MacRoman Character Sets (http://www.alanwood.net/demos/ charsetdiffs.html) The Letter Database (http://www.eki.ee/letter/) The ISO 8859 Alphabet Soup (http://czyborra.com/charsets/iso8859.html) - Roman Czyborra's summary of ISO character sets

Article Sources and Contributors

Article Sources and Contributors


ISO/IEC 8859-1 Source: http://en.wikipedia.org/w/index.php?oldid=589926607 Contributors: Achurch, Adam78, Adelton, Al shopov, Alxeedo, Amakuha, Andre Engels, Anon user, Anthony, Anrion, Athantor, Auslli, Avjewe, Babak info, Barklund, Basil.bourque, Bearcat, Ben morphett, Bennylin, Bgwhite, BiT, Brion VIBBER, Brycen, Bukzor, Burzuchius, Caoimhin, Ceplm, Cfsenel, Choster, ChrisGualtieri, Christian List, Chrullrich, Circular17, Conversion script, Copyeditor42, Crissov, Curps, CyberSkull, Dakart, DanielPharos, Dbachmann, Deh, Denelson83, Diberri, Docu, Don4of4, Droll, Dthomsen8, Dtobias, Dysprosia, Elektron, Ellmist, Emk (ja), Evertype, Fool4jesus, Furrykef, GPHemsley, Gaius Cornelius, Goh wz, GregorB, Gwinkless, Gyopi, Gtz, Harris7, Harryboyles, Here, Icairns, Incnis Mrsi, Indefatigable, IronGargoyle, Ixfd64, JTN, Jasen betts, Jeronimo, Jkl, John, Jor, Keka, Khukri, Konxykogure, Kooo, Ksn, Kwamikagami, Kwi, LauraALo, Lee Daniel Crocker, Liftarn, Liliana-60, LittleBenW, Livajo, Lmatt, Loadmaster, LoveEncounterFlow, Madacs, Magioladitis, ManuelGR, Martin.Budden, Mat cross, Matthiaspaul, Michael Peter Fustumum, Mikeo, Miles, Mjb, Monedula, Mxn, Mzajac, Naohiro19, Natural Cut, NatusRoma, Nbarth, Nickj, Nikevich, Nikola Smolenski, Nohat, Nsaa, OwenBlacker, Oz1cz, Paddu, Patrick, Paul Magnussen, Pengo, Perey, Phenry, Phil Boswell, PierreAbbat, Pjacobi, Plugwash, Pne, Poccil, Polluks, Poogis, Prof Wrong, Proxyma, QuartierLatin1968, Quota, R'n'B, RARPSL, Raffaele Megabyte, Raise exception, Rama, Red King, RedWolf, Rgrg, Rick Block, RickBeton, Rje, RoToRa, Rogper, Ruhrjung, Ruud Koot, Sandrarossi, Saric, Sburke, Shaun, Simo Kaupinmki, Sl, Sladen, Smb1001, Some jerk on the Internet, Spitzak, Stuartyeates, Stubblyhead, Suruena, TJRC, Tamfos, Tedickey, Telfordbuck, Tevildo, The Nut, Theopolisme, Thistheman, TimR, Timc, Tobias Conradi, Toby Bartels, Torzsmokus, Tox, Truthflux, UTF-8, Urhixidur, Vanisaac, Wavelength, Woohookitty, WorldlyWebster, Wrp103, Yop83, ZanderSchubert, ZeroUm, Zundark, var Arnfjr Bjarmason, , 170 anonymous edits

License
Creative Commons Attribution-Share Alike 3.0 //creativecommons.org/licenses/by-sa/3.0/