Ignore:
Timestamp:
Mar 13, 2011, 8:25:25 PM (9 years ago)
Author:
fielding@…
Message:

Use the term octet instead of character where we are talking
about parsing, since HTTP must be parsed as octets. Make sure
there is no ambiguity about how many spaces are allowed on a
request-line or response-line.

Use the term character encoding instead of character set
because it is annoying to use the term incorrectly regardless
of how MIME does it.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • draft-ietf-httpbis/latest/p3-payload.xml

    r1173 r1176  
    336336<section title="Protocol Parameters" anchor="protocol.parameters">
    337337
    338 <section title="Character Sets" anchor="character.sets">
    339 <t>
    340    HTTP uses the same definition of the term "character set" as that
    341    described for MIME:
    342 </t>
    343 <t>
    344    The term "character set" is used in this document to refer to a
    345    method used with one or more tables to convert a sequence of octets
    346    into a sequence of characters. Note that unconditional conversion in
    347    the other direction is not required, in that not all characters might
    348    be available in a given character set and a character set might provide
    349    more than one sequence of octets to represent a particular character.
    350    This definition is intended to allow various kinds of character
    351    encoding, from simple single-table mappings such as US-ASCII to
    352    complex table switching methods such as those that use ISO-2022's
    353    techniques. However, the definition associated with a MIME character
    354    set name &MUST; fully specify the mapping to be performed from octets
    355    to characters. In particular, use of external profiling information
    356    to determine the exact mapping is not permitted.
    357 </t>
    358 <x:note>
    359   <t>
    360     <x:h>Note:</x:h> This use of the term "character set" is more commonly
    361     referred to as a "character encoding". However, since HTTP and
    362     MIME share the same registry, it is important that the terminology
    363     also be shared.
    364   </t>
    365 </x:note>
     338<section title="Character Encodings (charset)" anchor="character.sets">
     339<t>
     340   HTTP uses charset names to indicate the character encoding of a
     341   textual representation.
     342</t>
    366343<t anchor="rule.charset">
    367344  <x:anchor-alias value="charset"/>
    368    HTTP character sets are identified by case-insensitive tokens. The
     345   A character encoding is identified by a case-insensitive token. The
    369346   complete set of tokens is defined by the IANA Character Set registry
    370347   (<eref target="http://www.iana.org/assignments/character-sets"/>).
     
    376353   Although HTTP allows an arbitrary token to be used as a charset
    377354   value, any token that has a predefined value within the IANA
    378    Character Set registry &MUST; represent the character set defined
     355   Character Set registry &MUST; represent the character encoding defined
    379356   by that registry. Applications &SHOULD; limit their use of character
    380    sets to those defined by the IANA registry.
     357   encodings to those defined within the IANA registry.
    381358</t>
    382359<t>
     
    10941071<t>
    10951072   The "Accept-Charset" header field can be used by user agents to
    1096    indicate what response character sets are acceptable. This field allows
     1073   indicate what character encodings are acceptable in a response
     1074   payload. This field allows
    10971075   clients capable of understanding more comprehensive or special-purpose
    1098    character sets to signal that capability to a server which is capable of
    1099    representing documents in those character sets.
     1076   character encodings to signal that capability to a server which is capable of
     1077   representing documents in those character encodings.
    11001078</t>
    11011079<figure><artwork type="abnf2616"><iref primary="true" item="Grammar" subitem="Accept-Charset"/><iref primary="true" item="Grammar" subitem="Accept-Charset-v"/>
     
    11061084</artwork></figure>
    11071085<t>
    1108    Character set values are described in <xref target="character.sets"/>. Each charset &MAY;
    1109    be given an associated quality value which represents the user's
    1110    preference for that charset. The default value is q=1. An example is
     1086   Character encoding values (a.k.a., charsets) are described in
     1087   <xref target="character.sets"/>. Each charset &MAY; be given an
     1088   associated quality value which represents the user's preference
     1089   for that charset. The default value is q=1. An example is
    11111090</t>
    11121091<figure><artwork type="example">
     
    11151094<t>
    11161095   The special value "*", if present in the Accept-Charset field,
    1117    matches every character set (including ISO-8859-1) which is not
     1096   matches every character encoding (including ISO-8859-1) which is not
    11181097   mentioned elsewhere in the Accept-Charset field. If no "*" is present
    1119    in an Accept-Charset field, then all character sets not explicitly
     1098   in an Accept-Charset field, then all character encodings not explicitly
    11201099   mentioned get a quality value of 0, except for ISO-8859-1, which gets
    11211100   a quality value of 1 if not explicitly mentioned.
     
    11231102<t>
    11241103   If no Accept-Charset header field is present, the default is that any
    1125    character set is acceptable. If an Accept-Charset header field is present,
     1104   character encoding is acceptable. If an Accept-Charset header field is present,
    11261105   and if the server cannot send a response which is acceptable
    11271106   according to the Accept-Charset header field, then the server &SHOULD; send
     
    25442523   Where it is possible, a proxy or gateway from HTTP to a strict MIME
    25452524   environment &SHOULD; translate all line breaks within the text media
    2546    types described in <xref target="canonicalization.and.text.defaults"/> of this document to the RFC 2049
     2525   types described in <xref target="canonicalization.and.text.defaults"/>
     2526   of this document to the RFC 2049
    25472527   canonical form of CRLF. Note, however, that this might be complicated
    25482528   by the presence of a Content-Encoding and by the fact that HTTP
    2549    allows the use of some character sets which do not use octets 13 and
    2550    10 to represent CR and LF, as is the case for some multi-byte
    2551    character sets.
     2529   allows the use of some character encodings which do not use octets 13 and
     2530   10 to represent CR and LF, respectively, as is the case for some multi-byte
     2531   character encodings.
    25522532</t>
    25532533<t>
Note: See TracChangeset for help on using the changeset viewer.