RFC 2130 (rfc2130) - Page 3 of 31
The Report of the IAB Character Set Workshop held 29 February - 1 March, 1996
Alternative Format: Original Text Document
RFC 2130 Character Set Workshop Report April 1997
recommendations to the IAB, IANA, and the IESG for furthering the
integration of this framework into text transmission protocols.
The architectural model specifies 7 layers, of which only three are
required for on-the-wire transmission. The Coded Character Set is a
mapping from a set of abstract characters to a set of integers. The
Character Encoding Scheme is a mapping from a Coded Character Set (or
several) to a set of octets. The Transfer Encoding Syntax is a
transformation applied to data which has been encoded using a
Character Encoding Scheme to allow it to be transmitted. These layers
should be specified in a transmitted text stream by using the MIME
encoding mechanisms.
This report recommends the use of ISO 10646 as the default Coded
Character Set, and UTF-8 as the default Character Encoding Scheme in
the creation of new protocols or new version of old protocols which
transmit text. These defaults do not deprecate the use of other
character sets when and where they are needed; they are simply
intended to provide guidance and a specification for
interoperability.
1: Introduction
This is the report of an IAB-sponsored invitational workshop on the
use of Character Sets on the Internet, held 29 February - 1 March
1996 at Information Sciences Institute (ISI) in Marina del Rey,
California. In addition, this report covers the discussion on the
mailing list up to and slightly beyond the workshop itself. The
goals of this workshop were to provide guidance to the IAB and the
IETF about the use of character sets on the Internet, and if possible
a common framework for interoperability between the many character
sets in use there. Both goals were achieved.
2: Character sets on the Internet - the problem
The term 'character set' is typically applied to the contents of a
wide variety of text transmission and display protocols used on the
Internet. Because the term is used to mean different things,
confusion has arisen. For example, the MIME registry of character
sets [MIME] contains items that may differ greatly in their
applicability and semantics in various Internet protocols.
In addition, there is a vast profusion of different text encoding
schemes in use on the Internet. This per se is not a problem; each
scheme has evolved to meet real needs. However, information
applications such as mail, directories, and the World Wide Web have
each developed different techniques for dealing with the growing
number of schemes. A robust information architecture for the
Weider, et. al. Informational