Вы находитесь на странице: 1из 2

Appendix A

Java Encoding Schemes

This appendix describes the character-encoding schemes that are supported by the Java
platform.

US-ASCII

US-ASCII is a 7-bit character set and encoding that covers the English-language
alphabet. It is not large enough to cover the characters used in other languages, however,
so it is not very useful for internationalization.

ISO-8859-1

ISO-8859-1 is the character set for Western European languages. It’s an 8-bit encoding
scheme in which every encoded character takes exactly 8 bits. (With the remaining
character sets, on the other hand, some codes are reserved to signal the start of a
multibyte character.)

UTF-8

UTF-8 is an 8-bit encoding scheme. Characters from the English-language alphabet are
all encoded using an 8-bit byte. Characters for other languages are encoded using 2, 3, or
even 4 bytes. UTF-8 therefore produces compact documents for the English language, but
for other languages, documents tend to be half again as large as they would be if they
used UTF-16. If the majority of a document’s text is in a Western European language,
then UTF-8 is generally a good choice because it allows for internationalization while
still minimizing the space required for encoding.

UTF-16

UTF-16 is a 16-bit encoding scheme. It is large enough to encode all the characters from
all the alphabets in the world. It uses 16 bits for most characters but includes 32-bit
characters for ideogram-based languages such as Chinese. A Western European-language
document that uses UTF-16 will be twice as large as the same document encoded using
UTF-8. But documents written in far Eastern languages will be far smaller using UTF-16.

Further Information about Character Encoding

The character set and encoding names recognized by Internet authorities are listed in the
IANA character set registry at http://www.iana.org/assignments/character-sets.

The Java programming language represents characters internally using the Unicode
character set, which provides support for most languages. For storage and transmission
over networks, however, many other character encodings are used. The Java 2 platform
therefore also supports character conversion to and from other character encodings. Any
Java runtime must support the Unicode transformations UTF-8, UTF-16BE, and UTF-
16LE as well as the ISO-8859-1 character encoding, but most implementations support
many more. For a complete list of the encodings that can be supported by the Java 2
platform, see
http://download.oracle.com/javase/6/docs/technotes/guides/intl/encoding.doc.html.

Вам также может понравиться